lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Busch <busch...@gmail.com>
Subject Re: [jira] Commented: (LUCENE-1195) Performance improvement for TermInfosReader
Date Fri, 23 May 2008 07:44:03 GMT
Oups, I added this comment to the wrong issue... too many open browser 
tabs... :)

Michael Busch (JIRA) wrote:
>     [ https://issues.apache.org/jira/browse/LUCENE-1195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12599297#action_12599297
] 
> 
> Michael Busch commented on LUCENE-1195:
> ---------------------------------------
> 
> {quote}
> Using the deprecated method would have the advantage that it (the whole wrapper class
in fact) would _have_ to be removed in 3.0.
> {quote}
> 
> Thanks for reviewing! You're right, I will change it to use the deprecated method and
also deprecate the wrapper class itself.
> 
> 
>> Performance improvement for TermInfosReader
>> -------------------------------------------
>>
>>                 Key: LUCENE-1195
>>                 URL: https://issues.apache.org/jira/browse/LUCENE-1195
>>             Project: Lucene - Java
>>          Issue Type: Improvement
>>          Components: Index
>>            Reporter: Michael Busch
>>            Assignee: Michael Busch
>>            Priority: Minor
>>             Fix For: 2.4
>>
>>         Attachments: lucene-1195.patch, lucene-1195.patch, lucene-1195.patch
>>
>>
>> Currently we have a bottleneck for multi-term queries: the dictionary lookup is being
done
>> twice for each term. The first time in Similarity.idf(), where searcher.docFreq()
is called.
>> The second time when the posting list is opened (TermDocs or TermPositions).
>> The dictionary lookup is not cheap, that's why a significant performance improvement
is
>> possible here if we avoid the second lookup. An easy way to do this is to add a small
LRU 
>> cache to TermInfosReader. 
>> I ran some performance experiments with an LRU cache size of 20, and an mid-size
index of
>> 500,000 documents from wikipedia. Here are some test results:
>> 50,000 AND queries with 3 terms each:
>> old:                  152 secs
>> new (with LRU cache): 112 secs (26% faster)
>> 50,000 OR queries with 3 terms each:
>> old:                  175 secs
>> new (with LRU cache): 133 secs (24% faster)
>> For bigger indexes this patch will probably have less impact, for smaller once more.
>> I will attach a patch soon.
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message