nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki>
Subject Re: IndexOptimizer (Re: Lucene performance bottlenecks)
Date Wed, 14 Dec 2005 10:06:46 GMT
Doug Cutting wrote:

> Andrzej Bialecki wrote:
>> Ok, I just tested IndexSorter for now. It appears to work correctly, 
>> at least I get exactly the same results, with the same scores and the 
>> same explanations, if I run the smae queries on the original and on 
>> the sorted index.
> Here's a more complete version, still mostly untested.  This should 
> make searches faster.  We'll see how much good the results are...
> This includes a patch to Lucene to make it easier to write hit 
> collectors that collect TopDocs.
> I'll test this on a 38M document index tomorrow.

I'll test it soon - one comment, though. Currently you use a subclass of 
RuntimeException to stop the collecting. I think we should come up with 
a better mechanism - throwing exceptions is too costly. Perhaps the 
HitCollector.collect() method should return a boolean to signal whether 
the searcher should continue working.

Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration  Contact: info at sigram dot com

View raw message