lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] Updated: (LUCENE-2130) Investigate Rewriting Constant Scoring MultiTermQueries per segment
Date Mon, 12 Jul 2010 11:02:50 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Michael McCandless updated LUCENE-2130:
---------------------------------------

    Attachment: LUCENE-2130.patch

OK it turns out fixing MultiTermsEnum to optimize for this particular usage (seeking forward
a bit at a time) wasn't too bad -- patch attached.

I just track the last seek term, and if a given sub's term is after the new seek term, we
can skip seeking it since that'd [very wastefully] just seek back to the term it's already
on.

With this patch, for query united~0.6 (N=2) the multi-segment index takes ~1588 msec vs ~736
msec on the optimized index.  This is still not great (2.2X slower), so we should still pursue
per-segment rewrite for fuzzy query, but is much better than 60X slower!  Progress not perfection...

> Investigate Rewriting Constant Scoring MultiTermQueries per segment
> -------------------------------------------------------------------
>
>                 Key: LUCENE-2130
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2130
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Mark Miller
>             Fix For: 4.0
>
>         Attachments: LUCENE-2130.patch, LUCENE-2130.patch, LUCENE-2130.patch
>
>
> This issue is likely not to go anywhere, but I thought we might explore it. The only
idea I have come up with is fairly ugly, and unless something better comes up, this is not
likely to happen.
> But if we could rewrite constant score multi-term queries per segment, MTQ's with auto
(when the heuristic doesnt cut over to constant filter), or constant boolean rewrite could
enum terms against a single segment and then apply a boolean query against each segment with
just the terms that are known to be in that segment. This also allows you to avoid DirectoryReaders
MultiTermEnum and its PQ. (See Roberts comment below).
> No biggie, not likely, but what the heck.
> So the ugly way to do it is to add a property to query's and weights - lateCnstRewrite
or something, that defaults to false. MTQ would return true if its in a constant score mode.
On the top level rewrite, if this is detected, an empty ConstantScoreQuery is made, and its
Weight is turned to lateCnstRewrite and it keeps a ref to the original MTQ query. It also
gets its boost set to the MTQ's boost. Then when we are searching per segment, if the Weight
is lateCnstRewrite, we grab the orig query and actually do the rewrite against the subreader
and grab the actual constantscore weight. It works I think - but its a little ugly.
> Not sure its worth the baggage for the win - but perhaps the objective can be met in
another way.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message