nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki ...@getopt.org>
Subject Re: Lucene performance bottlenecks
Date Thu, 08 Dec 2005 16:59:14 GMT
Piotr Kosiorowski wrote:

>Hi,
>I started to think about implementing special kind of Lucene Query (if I
>remember correctly I would have to write my own Scorer and probably a few
>other classes) optimized for Nutch some time ago. I assumed having
>specialized query I would be able to avoid accessing some of lucene index
>structures multiple times as the same term apears many times in query
>generated by Nutch for multitoken queries. I am not an Lucene expert but
>maybe it is worth checking if it might give some performance boost. Has
>anyone any ideas why it might help or not?
>  
>

That's a very good comment. Looking at the profile traces I can see that 
a lot of time is spent just juggling the sub-query scorers inside the 
BooleanScorer, and handling the complex query structure; if this part 
could be optimized by the use of a special scorer, it could be a big win.

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



Mime
View raw message