lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From DM Smith <>
Subject Re: Lucene's default settings & back compatibility
Date Tue, 19 May 2009 10:47:37 GMT

On May 18, 2009, at 11:31 PM, Robert Muir wrote:

> I am curious about this, do you think its a better default because  
> it avoids the max boolean clauses problem? or because for a lot of  
> these scoring doesn't make much sense anyway?
> I ran tests on a pretty big index, you pay a price for the constant  
> score/filter method. Its slower for the common case searches, it  
> only starts to win for queries that return > 10% or so the index,  
> but its significantly slower for narrow queries...
> I'm just trying to imagine a case where queries that return > 10% or  
> so of the index are actually the common/default...?

It is common in my application, a Bible program, that indexes each  
verse (think of a verse as a numbered sentence) as a separate  
document. We index everything, including words that are typically stop  
words as those might be important to our end users. Besides this, the  
top 280 word roots represent 90% of the occurrences.

And on searches, we return everything in book order, unless the user  
wants to score the result. In that case, we return a small, user  
configurable amount of hits ordered by score.

And we are using Lucene out of the box for the most part. We've  
deviated only to incrementally solve performance problems.

>  * Constant score rewrite ought to be the default for most multi-term
>    queries
> -- 
> Robert Muir

View raw message