lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jamie <ja...@stimulussoft.com>
Subject Re: Lucene TermsFilter lookup slow
Date Tue, 18 Aug 2015 07:19:17 GMT
Michael

Forgive me, I am not familiar with Lucene internal code. Can you verify 
whether these suggested changes are indeed correct.

I am changing line 210 of TermsFilter.

   if (result == null) {
       if (docs.nextDoc() != DocIdSetIterator.NO_MORE_DOCS) {
           result = new FixedBitSet(reader.maxDoc());
            // lazy init but don't do it in the hot loop since we could 
read many docs
            result.set(docs.docID());
        }
}
// below commented out
// while (docs.nextDoc() != DocIdSetIterator.NO_MORE_DOCS) {
//  result.set(docs.docID());
//}

This change seems to have very little impact on performance.

It is taking around 25 second to look up documents associated with 
murmur hash string id's on an index size of 10m records.

Thanks in advance

Jamie

On 2015/08/10 2:46 PM, Michael McCandless wrote:
> OK, indeed, that version has the changes I was thinking of,
> specifically optimizing the case when only a single doc contains a
> term by inlining that docID into the terms dict.
>
> You should be able to improve on TermsFilter a bit because you know
> only 1 doc matches each ID, so after the first segment finds a given
> ID you should stop testing other segments.  Also, since you are doing
> bulk lookup, you should pre-sort the IDs so it's a sequential scan
> through the terms dict.
>
> There is another thread right now, subject "Mapping doc values back to
> doc ID (in decent time)", also talking about how to do faster PK
> lookups.
>
> Mike McCandless
>
> http://blog.mikemccandless.com


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message