lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Abdul Chaudhry (JIRA)" <>
Subject [jira] Commented: (LUCENE-443) ConjunctionScorer tune-up
Date Tue, 11 Oct 2005 06:37:11 GMT
    [ ] 

Abdul Chaudhry commented on LUCENE-443:

ok, this makes sense as the scoring engine runs something like this

while ( {
  int doc = scorer.doc();
  float scorer = scorer.score();
  collector.collect(doc, score);

That is, next() will have ordered everything, so that by the time we call the scorer.score()
method , everything should be in-order.

Thanks, ill give that a go.

The impression I have with lucene, and correct me if Im wrong, is that complex queries with
many terms and clauses have their bottleneck in terms of performance in the ordering phase,
that is requires everything to be in-document order and all the scorer sub-engines
must comply. Collection is a moot point as you probably have small numbers of hits. However,
on the other end of the scale, for queries with one or two terms that have a very high frequency
the bottleneck is really in collection, that is the priority queue in collector.collect(),
Essentially this is a sorting issue, somewhat masked and manipulated at various stages.
This looks to me like lucene needs a "Query Plan". 

> ConjunctionScorer tune-up
> -------------------------
>          Key: LUCENE-443
>          URL:
>      Project: Lucene - Java
>         Type: Bug
>   Components: Search
>     Versions: 1.9
>  Environment: Linux, Java 1.5, Large Index with 4 million items and some heavily nested
boolean queries
>     Reporter: Abdul Chaudhry
>  Attachments:,
> I just recently ran a load test on the latest code from lucene , which is using a new
BooleanScore and noticed the ConjunctionScorer was crunching through objects , especially
while sorting as part of the skipTo call. It turns a linked list into an array, sorts the
array, then converts the array back to a linked list for further processing by the scoring
engines below.
> 'm not sure if anyone else is experiencing this as I have a very large index (> 4
million items) and I am issuing some heavily nested queries
> Anyway, I decide to change the link list into an array and use a first and last marker
to "simulate" a linked list.
> This scaled much better during my load test as the java gargbage collector was less -
umm - virulent 

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
For more information on JIRA, see:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message