lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <>
Subject [jira] [Commented] (LUCENE-4396) BooleanScorer should sometimes be used for MUST clauses
Date Wed, 14 May 2014 14:51:15 GMT


Michael McCandless commented on LUCENE-4396:

I like this tasks file!

But, maybe we could test on fewer terms, for the
Low/HighAndManyLow/High tasks?  I think it's more common to have a
handful (3-5 maybe) of terms.  But maybe keep your current category
and rename it to Tons instead of Many?

Thank you for adding the test case; it's always disturbing when
luceneutil finds a bug that "ant test" doesn't!  Maybe we can improve
the test so that it exercises BS and NBS?  E.g., toggle the "require
docs in order" via a custom collector?  We could commit this test today
to trunk/4x right?

bq. A patch for luceneutil, which allows scores is different within a tolerance range.

Hmm do we know why the scores changed?  Are we comparing BS2 to
NovelBS?  (I think BS and BS2 already have different scores today?).

So, with these changes, BS (a BulkScorer) can handle required clauses
(but you commented this out in your patch in order to test NBS I
guess?), and NBS (a Scorer) can handle required too.

Do you have any perf results of BS w/ required clauses (as a
BulkScorer) vs BS2 (what trunk does today)?

> BooleanScorer should sometimes be used for MUST clauses
> -------------------------------------------------------
>                 Key: LUCENE-4396
>                 URL:
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>         Attachments: AndOr.tasks, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch,
LUCENE-4396.patch, luceneutil-score-equal.patch
> Today we only use BooleanScorer if the query consists of SHOULD and MUST_NOT.
> If there is one or more MUST clauses we always use BooleanScorer2.
> But I suspect that unless the MUST clauses have very low hit count compared to the other
clauses, that BooleanScorer would perform better than BooleanScorer2.  BooleanScorer still
has some vestiges from when it used to handle MUST so it shouldn't be hard to bring back this
capability ... I think the challenging part might be the heuristics on when to use which (likely
we would have to use firstDocID as proxy for total hit count).
> Likely we should also have BooleanScorer sometimes use .advance() on the subs in this
case, eg if suddenly the MUST clause skips 1000000 docs then you want to .advance() all the
SHOULD clauses.
> I won't have near term time to work on this so feel free to take it if you are inspired!

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message