lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <>
Subject Re: [jira] Created: (LUCENE-851) Pruning
Date Wed, 04 Apr 2007 15:33:35 GMT

On Mar 29, 2007, at 7:44 PM, Ning Li wrote:

> If a query requires top-K results, isn't it
> sufficient to find top-K results in each segment and merge them to
> return the overall top-K results?

They are merged by collecting them into a HitQueue.

> Early termination happens in
> finding top-K results in one segment. Assuming each document has a
> static score, document ids are assigned in the same order of their
> static scores within a segment. If a top-K query is scored by the same
> static score, query processing on a segment can stop as soon as the
> first K results are found.

Indeed, that's exactly how the loop in Scorer_collect() works.

> As to the indexing side, applications should be able to pick such a
> static score? If Lucene score function is used, norm is a good
> candidate? (One tricky thing with norm is that it is updatable.)

I would argue that only a single mechanism based on indexed, non- 
tokenized fields should be used to determine sort order.  Sort order  
based upon norms is easy for the user to fake using a dedicated field  
at a small cost, so library-level support is not needed.

Marvin Humphrey
Rectangular Research

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message