lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Birtwell <>
Subject Re: Question: using boost for sorting
Date Fri, 18 Oct 2002 13:56:26 GMT
Doug Cutting wrote:

> David Birtwell wrote:
>> To enable this, Similarity could have a method like:
>>    float applyNorm(float baseScore)
>> which could optionally ignore baseScore and modify the scorer classes 
>> to do:
>>    score = applyNorm(score)
>> instead of the
>>    score *= Similarity.decodeNorms()
> That would add a method call in the innermost search loop, which would 
> probably have a noticeable performance impact.  
> (Similarity.decodeNorm() is a simple static method that JITs can 
> trivially inline.)  Couldn't you achieve the same effect by overriding 
> the normalizeLength() method and/or use Field.setBoost() to impact the 
> value that is stored in the norm file?  That way this computation is 
> performed  at index time rather than at search time.

Hmmm... you know what, I hadn't considered performance when making the 
above suggestion.  Still, though, I don't see how to accomplish strict 
ordering of results without making a modification to the the score() 

I may be missing something, but my understanding at this point is that 
the ordering of results is determined by the score, and the score is a 
combination of the relevance of the hit (frequency/density of terms, 
etc....) and the norm values.  To predefine the order of results at 
index time, we have to be able to throw out the "hit relevance" portion 
of the score at search time.

Could we make applyNorm() a static method of Similarity and achieve 
acceptable performance?  The default implementation could be something like:

static float applyNorm(float hitRelevance, byte norm)
    return hitRelevance * decodeNorm(norm)

For strict ordering:

static float applyNorm(float hitRelevance, byte norm)
    // ignore hit relevance
    return decodeNorm(norm)

> If you need to be able to dynamically change the scoring method at 
> search time then there will probably be a performance impact.  Ideally 
> this should still be an option, however this would require opening up 
> the scorer API, so that folks could define different scorer 
> implementations for each Query class.  I'm not sure I yet want to take 
> on that task, but if you have a proposal, I'd love to hear it.

Heh, no proposals here... yet.  This topic directly affects the 
application development work I'm doing and I'd love to propose a 
solution (or otherwise contribute).  Though, I would want to gain a 
deeper understanding of Lucene before doing so.  I'm going to try to 
make some time to do so in the coming weeks which will hopefully enable 
me to make an intelligent contribution a little later on.


To unsubscribe, e-mail:   <>
For additional commands, e-mail: <>

View raw message