- Reduce the number of rows on which oracle will operate to execute the query
- Reduce the size of each row
- Keep the most frequently accessed rows in memory
To achieve this; I did the following optimizations: -
1. Added the "FIRST_ROWS(n)" Hint to the query.
This documentation from oracle suggests that it should improve performance.
http://download-west.oracle.com/docs/cd/B10501_01/text.920/a96517/aoptim.htm#1009023
2. Generated a report on the Context index to discover any scope of optimizations
The report can be generated as described in the following section:-
http://download-east.oracle.com/docs/cd/B14117_01/text.101/b10730/crptpkg.htm#i996935
This report gave very useful insights; particularly
- Number of $I rows, which was very large
- Most frequent tokens
- Fragmentation level of the index
Many of our queries searched on codes within the documents which have "/" as a part of the code. From this report, I found that Oracle by default breaks this one code into two tokens. For e.g. "A/B" which we intended to be stored as a single token was broken into two tokens "A" & "B". So now, oracle has to work on a lot more number of $I rows when we search on "A/B". Taking a lead from this I defined a printjoin with a "/" and a custom lexer to use the printjoin and rebuild the index.
3. Disabled "storage in row" for the CLOB
By default oracle stores CLOB data in-line; i.e. in the row if its size is less than 4K. So, if oracle fetches the row it fetches all the CLOB data as well. In our case most of the search text was less than 4k and so most of it was stored inline. This was one of the major contributors to the I/O. I disabled storage in row which greatly helped in reducing the size of individual row as well as the amount of I/O required. This is suggested in the following section of Oracle text search documentation.
http://download-west.oracle.com/docs/cd/B10501_01/text.920/a96517/ind.htm#1016214
4. Partitioned the table by Range and created a partitioned context index (search index)
I partitioned the table and the index by range as suggested by the following documentation.
http://download-west.oracle.com/docs/cd/B10501_01/text.920/a96517/aoptim.htm#1009230
This helped in reducing the number of rows on which oracle operates by leveraging partition pruning. Also, it reduced the effective size of the search index as we limited searches only to one partition.
5. Defined a keep pool equal to the size of the documents table & configured the documents table to be kept in the keep pool.
Most of the previous optimizations helped in greatly reducing the amount of I/O by reducing the size of the table as well as by breaking the table & search index into multiple partitions. Now, since we have much less amount of working data, if we can cache all of it in memory then the performance can improve significantly.
This was achieved by defining a keep pool big enough to store the latest partitions of the table & the index. More details on how this can be done as well as how the index can be preloaded can be found in the following article.
http://www.oracle.com/technology/products/text/htdocs/mem_load.html
Results
The results were impressive. We did a load test with twice the production load i.e. 24 search requests per minute. Before the optimizations the system crashed within 500 search requests and the successful requests had an average response time of 5 minutes. After the final optimizations; searches completed with an average response time of 350 ms.
9 comments:
Nice blog. I have bookmarked it for detailed reading, but one comment: have you thought of using Lucene along with the Oracle DB for the searches? Or rather, how feasible would this be? I think you should be able to use the DB for manipulating the index with the Lucene engine.
Thanks Nitish.
I do not have much expertise on Lucene, so I won't be able to comment on your suggestion.
Regards,
Samarth
We almost implemented a similar approach, our table has 15M records, so even after partitioning the table into 8 partitions and create local text indexes the performance is not subsecond. moreover you cannot rebuild indexes online for the partitioned indexes which makes it more difficult to reduce fragmentation as it requires downtime for the application. I never tried lucene either, but the performance problem with oracle text is driving me to give it a try.
Thanks
Subbarao
just want to add we use DQM kind of framework for searching and regularly run optimize index to reduce fragmentation. but want to let you know that optimize index doesnt completely defragment the index, you need to rebuild or recreate index often. i observed 50% size reduction of index after rebuild. performance will be good for few days and again after a month or so it gets badly fragmented inspite of running optimize index programs
Hi Subba,
In our case I/O was the main problem, and what helped us the most was to move the clob storage "out of line".
This ensured that when the search query executes, it only fetches the refrences of the clob data, instead of the complete clob data in memory. This helped us a lot in reducing the I/O and getting sub-second performance.
Also, it will be interesting to know what kind of partitioning you have implemented?
A range paritioing on a coloumn which is present in the "where clause" definitely helps a lot.
Regards,
Samarth
You are right. We did range partition on a column and used it in where clause(for sure). FYI, Text index can be created on range partitioned table only. You cannot create local text index on List partitioned and hybrid partitioned (also sub-partitioned) tables.
Excellent post. Thanks!
Nice blog and some good tips about partitioning! However you basically made the search "in memory" by keeping your text table/index in keep pool and making your keep pool large. In memory is obviously going to be fast, so we are comparing apples to oranges by comparing in Oracle Text in memory search to Lucene.
Post a Comment