nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki ...@getopt.org>
Subject Re: IndexOptimizer (Re: Lucene performance bottlenecks)
Date Tue, 13 Dec 2005 06:58:50 GMT
Doug Cutting wrote:

> Andrzej Bialecki wrote:
>
>> By all means please start, this is still near the limits of my 
>> knowledge of Lucene... ;-)
>
>
> Attached is a class which sorts a Nutch index by boost.  I have only 
> tested it on a ~100 page index, where it appears to work correctly. 
> Please tell me how it works for you.


Shouldn't this be combined with a HitCollector that collects only the 
first-n matches? Otherwise we still need to scan the whole posting list...

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



Mime
View raw message