lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alan Woodward (JIRA)" <>
Subject [jira] [Updated] (LUCENE-7628) Add a getMatchingChildren() method to DisjunctionScorer
Date Thu, 12 Jan 2017 10:24:52 GMT


Alan Woodward updated LUCENE-7628:
    Attachment: LUCENE-7628.patch

Here's a patch opting for the first of the two options I describe above.  The change is pretty
small - just a default implementation on Scorer, and then specialised methods on DisjunctionScorer
and MinShouldMatchSumScorer.  I think this is the best way forward?

> Add a getMatchingChildren() method to DisjunctionScorer
> -------------------------------------------------------
>                 Key: LUCENE-7628
>                 URL:
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Alan Woodward
>            Assignee: Alan Woodward
>            Priority: Minor
>         Attachments: LUCENE-7628.patch
> This one is a bit convoluted, so bear with me...
> The luwak highlighter works by rewriting queries into their Span-equivalents, and then
running them with a special Collector.  At each matching doc, the highlighter gathers all
the Spans objects positioned on the current doc and collects their positions using the SpanCollection
> Some queries can't be translated into Spans.  For those queries that generate Scorers
with ChildScorers, like BooleanQuery, we can call .getChildren() on the Scorer and see if
any of them are SpanScorers, and for those that aren't we can call .getChildren() again and
recurse down.  For each child scorer, we check that it's positioned on the current document,
so non-matching subscorers can be skipped.
> This all works correctly *except* in the case of a DisjunctionScorer where one of the
children is a two-phase iterator that has matched its approximation, but not its refinement
query.  A SpanScorer in this situation will be correctly positioned on the current document,
but its Spans will be in an undefined state, meaning the highlighter will either collect incorrect
hits, or it will throw an Exception and prevent hits being collected from other subspans.
> We've tried various ways around this (including forking SpanNearQuery and adding a bunch
of slow position checks to it that are used only by the highlighting code), but it turns out
that the simplest fix is to add a new method to DisjunctionScorer that only returns the currently
matching child Scorers.  It's a bit of a hack, and it won't be used anywhere else, but it's
a fairly small and contained hack.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message