lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Gibney (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (LUCENE-7848) QueryBuilder.analyzeGraphPhrase does not handle gaps correctly
Date Wed, 12 Jul 2017 13:25:00 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-7848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Michael Gibney updated LUCENE-7848:
-----------------------------------
    Attachment: LUCENE-7848-branching-spanOr.patch

"Could be a bug somewhere in span queries."^ -- I think the remaining problem here is that
only one branch (the shortest) of a SpanOrQuery is evaluated, at which point the "spanOr"
is designated a match (or not) of the width/positionEnd of the shortest branch. When the branches
of a "spanOr" differ in length (as they will as a matter of course for uses of GraphFilters
such as in the above test), the shorter branch is evaluated, but if a longer branch is also
a match, it affects the offset of subsequent tokens, and the enclosing "spanNear" sees a larger-than-expected
slop, and fails to match. 

[^LUCENE-7848-branching-spanOr.patch] adjusts SpanOrQuery to support repeated calls to nextStartPosition()
which return the same startPosition, but different endPositions. The subSpan clauses of the
"spanOr" are popped off the priorityQueue, retained, and restored upon exhaustion of subSpans
(when it's time to move on to the next potential match). Some corresponding changes were necessary
to make NearSpansOrdered aware of the new "spanOr" behavior, and conditionally evaluate as
many branches of "spanOr" clauses as necessary to match (or not) on the full "nearSpan".

There may be other modifications needed in code that can call the modified "spanOr" and would
need to be aware of its new behavior, but with this patch applied, all the tests in the TestWordDelimiterGraphFilter
pass (including the new testLucene7848()). 

> QueryBuilder.analyzeGraphPhrase does not handle gaps correctly
> --------------------------------------------------------------
>
>                 Key: LUCENE-7848
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7848
>             Project: Lucene - Core
>          Issue Type: Bug
>    Affects Versions: 6.5, 6.6
>            Reporter: Jim Ferenczi
>         Attachments: capture-3.png, LUCENE-7848-branching-spanOr.patch, LUCENE-7848.patch,
LUCENE-7848.patch
>
>
> Position increments greater than 1 are ignored when the query builder creates a graph
phrase query. 
> Instead it should use SpanNearQuery.addGap for pos incr > 1.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message