lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From José Ramón Pérez Agüera <>
Subject Re: problem in Lucene's ranking function
Date Wed, 05 May 2010 20:00:52 GMT
Hi Robert,

I will be very happy to see this problem fixed :-) I can not image
what reasons people have to use software with bugs, I guess that
others bugs in lucene are removed. Anyway, if finally you are going to
fix the problem, these are good news :-) thank you very much for your


On Wed, May 5, 2010 at 3:10 PM, Robert Muir <> wrote:
> 2010/5/5 José Ramón Pérez Agüera <>
>> Hi Robert,
>> the problem is not the linear combination of fields, the problem is to
>> apply the boost factor per field after the term frequency saturation
>> function and then make the linear combination of fields. Every system
>> that implement BM25F, including terrier, take care of that, because if
>> you don't do it you have a bug in your ranking function and not just a
>> different ranking function.
> José, well then this should not be much of a problem to handle in
> LUCENE-2392, because as I mentioned, if you have a tf() or idf() its really
> because you decided to do this yourself. So you could easily apply the boost
> inside your log or sqrt or whatever, if you want.
> But what I propose we do, is make sure the relevance functions we provide
> (especially any default for 4.0) take care of this for your structured case,
> while still providing the capability for someone to get the old behavior
> [see below]
>> If you implement this little
>> change, Lucene ranking fucntion will work properly with structured
>> documents and all your other concerns about allowing users to
>> implement different ranking functions for different situations will be
>> not affected by this change.
> Well, I'm not sure all my concerns go away! I think its best to implement a
> change like this in the flexible scoring framework (LUCENE-2392), so that
> users, if they want, can get the old behavior: "the bug" as you call it.
> The reason I say this due to the unique cases of lucene, some people are
> doing scoring in very crazy ways and if they aren't able to get the old
> behavior with regards to boosting, they might be upset... even if it is
> really giving them worse relevance...
> --
> Robert Muir

Jose R. Pérez-Agüera

Clinical Assistant Professor
Metadata Research Center
School of Information and Library Science
University of North Carolina at Chapel Hill
Web page:
MRC website:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message