lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Lucene's default settings & back compatibility
Date Tue, 19 May 2009 10:50:07 GMT
On Mon, May 18, 2009 at 11:31 PM, Robert Muir <rcmuir@gmail.com> wrote:
> I am curious about this, do you think its a better default because it avoids
> the max boolean clauses problem? or because for a lot of these scoring
> doesn't make much sense anyway?

I think you're referring to constant score mode default, for
MultiTermQuery & QueryParser, right?

> I ran tests on a pretty big index, you pay a price for the constant
> score/filter method. Its slower for the common case searches, it only starts
> to win for queries that return > 10% or so the index, but its significantly
> slower for narrow queries...
>
> I'm just trying to imagine a case where queries that return > 10% or so of
> the index are actually the common/default...?

Excellent points!  And this also makes clear why healthy discussion on
each default is important, as well as how good it'd be to have
Settings online so that we are free to even have such discussions
(vs being bound by back-compat which prevents any improvements
to the defaults).

I was actually referring to the fact that scores for MultiTermQuery
rewritten to BooleanQuery are often meaningless to the app (I
think?).  But you're right the performance cost of the "make a filter
up front" approach is too high for smallish queries.

Thinking more on this... I'd love to have a constant-score mode, but
implemented as a BooleanQuery, meaning the scores would be the same
(constant) regardless of whether under-the-hood the query was
rewritten to BooleanQuery vs pre-compiled up front into a BitSet.

This would then decouple scoring from rewrite method, which in turn
would give us the freedom to pick and choose the fastest impl based on
particulars of the query.

So if we had such a ConstantScoreBooleanQuery, and we fixed
MultiTermQuery to conditionally use that, then I think we'd want
MultiTermQuery to do constant scoring by default.  (And, it'd then be
free pick whether "create filter up front" or "use
ConstantScoreBooleanQuery" was most performant, query by query).

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message