lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-4396) BooleanScorer should sometimes be used for MUST clauses
Date Tue, 29 Apr 2014 16:26:22 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13984459#comment-13984459
] 

Michael McCandless commented on LUCENE-4396:
--------------------------------------------

Thanks Da, this looks neat!

Hmm, the patch didn't cleanly apply, but I was able to work through
it.  I think your dev area is not up to date with trunk?

Small code style things: can you try to add \{ .. \} around the
true/else body of if statements, even if they are only one line?
And also no whitespace around the condition.  E.g. instead of:

{noformat}
      if ( required.size() > 0 )
        return new BooleanNovelScorer(this, disableCoord, minNrShouldMatch, required, optional,
prohibited, maxCoord);
{noformat}

do this:

{noformat}
      if (required.size() > 0) {
        return new BooleanNovelScorer(this, disableCoord, minNrShouldMatch, required, optional,
prohibited, maxCoord);
      }
{noformat}

So it looks like BooleanNovelScorer is able to be a Scorer because the
linked-list of visited buckets in one window are guaranteed to be in
docID order, because we first visit the requiredConjunctionScorer's
docs in that window.

Have you tested performance when the .advance method here isn't called?
Ie, just boolean queries w/ one MUST and one or more SHOULD?  I think
the important question here is whether/in what cases the
BooleanNovelScorer approach beats BooleanScorer2 performance?

I realized LUCENE-4872 is related here, i.e. we should also sometimes
use BooleanScorer for the minShouldMatch>1 case.


> BooleanScorer should sometimes be used for MUST clauses
> -------------------------------------------------------
>
>                 Key: LUCENE-4396
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4396
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>         Attachments: LUCENE-4396.patch, LUCENE-4396.patch
>
>
> Today we only use BooleanScorer if the query consists of SHOULD and MUST_NOT.
> If there is one or more MUST clauses we always use BooleanScorer2.
> But I suspect that unless the MUST clauses have very low hit count compared to the other
clauses, that BooleanScorer would perform better than BooleanScorer2.  BooleanScorer still
has some vestiges from when it used to handle MUST so it shouldn't be hard to bring back this
capability ... I think the challenging part might be the heuristics on when to use which (likely
we would have to use firstDocID as proxy for total hit count).
> Likely we should also have BooleanScorer sometimes use .advance() on the subs in this
case, eg if suddenly the MUST clause skips 1000000 docs then you want to .advance() all the
SHOULD clauses.
> I won't have near term time to work on this so feel free to take it if you are inspired!



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message