lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Da Huang (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (LUCENE-4396) BooleanScorer should sometimes be used for MUST clauses
Date Sun, 11 May 2014 00:51:15 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Da Huang updated LUCENE-4396:
-----------------------------

    Attachment: LUCENE-4396.patch

The patch is based on the github mirror commit c1e423e45e6fa9f846ab2c382c0100fd515b4cb1.

The following things are done in this patch:

1. Fix the bug on last patch. The bug is due to not setting prev and next to null before add
an element to a linked list.

2. Refine the code style.

3. Make a small improvement on .advance(). The performance is a little better than the last
patch, but still worse than the trunk, when testing on luceneutil.

P.S. The bug on last patch can not be detected by ant-test, but can be found by running query
like "+a b" on luceneutil. I'm getting to add a junit test case which can detect the bug,
but it may take me some days.

> BooleanScorer should sometimes be used for MUST clauses
> -------------------------------------------------------
>
>                 Key: LUCENE-4396
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4396
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>         Attachments: LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch
>
>
> Today we only use BooleanScorer if the query consists of SHOULD and MUST_NOT.
> If there is one or more MUST clauses we always use BooleanScorer2.
> But I suspect that unless the MUST clauses have very low hit count compared to the other
clauses, that BooleanScorer would perform better than BooleanScorer2.  BooleanScorer still
has some vestiges from when it used to handle MUST so it shouldn't be hard to bring back this
capability ... I think the challenging part might be the heuristics on when to use which (likely
we would have to use firstDocID as proxy for total hit count).
> Likely we should also have BooleanScorer sometimes use .advance() on the subs in this
case, eg if suddenly the MUST clause skips 1000000 docs then you want to .advance() all the
SHOULD clauses.
> I won't have near term time to work on this so feel free to take it if you are inspired!



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message