lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jim Ferenczi (JIRA)" <>
Subject [jira] [Commented] (LUCENE-8286) UnifiedHighlighter should support the new Weight.matches API for better match accuracy
Date Wed, 02 May 2018 14:37:00 GMT


Jim Ferenczi commented on LUCENE-8286:

I also think that it would greatly simplify the code (especially PhraseHelper ;) ) but matches
require some changes to allow this replacement. First of all there is no way to retrieve the
term/query in the matches iterator so it's not possible to count the number of occurrences
of a specific query or the total frequency in the document. These informations are needed
to compute the score of a passage so we need to add something in matches.
The matches iterator can return duplicates (if the same term is present in multiple clauses)
and will soon be able to return matches from phrases (rather than individual terms), this
means that we'll need to detect overlapping intervals when the passages are built. I see this
as an improvement since it would allow to highlight entire phrases but for spans we'll need
an option to split matches interval since a span near (or any other span query) can have big
gaps so it would not make sense to highlight the entire match in a single highlight.
One thing we could do to simplify the transition is to remove OffsetsEnum entirely and replace
it with the MatchesIterator, appart from the missing bits I described above this should be
easy to do.

> UnifiedHighlighter should support the new Weight.matches API for better match accuracy
> --------------------------------------------------------------------------------------
>                 Key: LUCENE-8286
>                 URL:
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/highlighter
>            Reporter: David Smiley
>            Priority: Major
> The new Weight.matches() API should allow the UnifiedHighlighter to more accurately highlight
some BooleanQuery patterns correctly -- see LUCENE-7903.
> In addition, this API should make the job of highlighting easier, reducing the LOC and
related complexities, especially the UH's PhraseHelper.  Note: reducing/removing PhraseHelper
is not a near-term goal since Weight.matches is experimental and incomplete, and perhaps we'll
discover some gaps in flexibility/functionality.
> This issue should introduce a new UnifiedHighlighter.HighlightFlag enum option for this
method of highlighting.   Perhaps call it {{WEIGHT_MATCHES}}?  Longer term it could go away
and it'll be implied if you specify enum values for PHRASES & MULTI_TERM_QUERY?

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message