lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki>
Subject Performance issues with ConjunctionScorer
Date Tue, 22 Nov 2005 11:49:45 GMT

I've been profiling a Nutch installation, and to my surprise the largest 
amount of throwaway allocations and the most time spent was not in Nutch 
specific code, or IPC, but in Lucene ConjunctionScorer.doNext() method. 
This method operates on a LinkedList, which seems to be a huge 
bottleneck. Perhaps it would be possible to replace LinkedList with a table?

Nutch Summarizer also needlessly re-tokenizes the text over and over 
again - perhaps it would be better to save already tokenized text in 
parse_text, instead of the raw plain text? After all, the only use for 
that text is to index it and then build the summaries.

Please see the profiles here:

Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration  Contact: info at sigram dot com

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message