lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <>
Subject Re: Re[2]: lucene scoring
Date Thu, 07 Aug 2008 19:54:41 GMT

On Aug 7, 2008, at 3:05 PM, Александр Аристов wrote:

> I want implement searching with ability to set so-called a  
> confidence level below which I would treat documents as garbage. I  
> cannot defile the level per query as the level should be relevant  
> for all documents.
> With current scoring implementation the level would mean nothing. I  
> don't believe that since that time (the thread is of 2005year)  
> nothing has been made towards the resolving the issue.

That's because there is no resolution to be had, as far as I know, but  
I'm open to suggestions (patches are even better.)  What would it mean  
to say that a score of 0.5 for "baby kittens" is comparable to a score  
of 0.5 for "death metal"?  Like I said, I don't think that 0.5 for  
"baby kittens" is even comparable later if you added other documents  
that contain any of the query terms.

> Do you think any workarounds like implementing more sophisticated  
> queries so that we have approximately the same normalization values?

I just don't think you will be successful with this, and I don't  
believe it is a Lucene issue alone, but one that applies to all search  
engines, but I could be wrong.

I get what you are trying to do, though, I've wanted to do it from  
time to time.   Another approach may be to look for significant  
differences between scores w/in a result set.   For example, if doc 1  
is 0.8, doc 2 is 0.79 and then doc 3 is 0.2, then maybe one could  
argue that doc 3 is garbage, but even that is somewhat of a stretch.   
Garbage truly is in the eye of the beholder.

Another option is to do more relevance tuning to make sure your top 10  
are as good as possible so that your garbage is minimized.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message