lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-7603) Support Graph Token Streams in QueryBuilder
Date Wed, 28 Dec 2016 10:36:58 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-7603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15782609#comment-15782609
] 

Michael McCandless commented on LUCENE-7603:
--------------------------------------------

This change looks great; I think it's ready!  The new {{TestGraphTokenStreamFiniteStrings}}
is just missing the copyright header; I'll fix that before pushing.

The gist of the change is when query parsing detects that the analyzer produced a graph (any
token with {{PositionLengthAttribute}} > 1), e.g. because {{SynonymGraphFilter}} matched
or inserted a multi-token synonym, then it creates a {{GraphQuery}} which just a wrapper around
sub-queries that traverse each path of the graph.

At search time, this query is currently rewritten to {{BooleanQuery}} with one clause for
each path, but that is maybe something we can improve in the future, e.g. if it's a phrase
query we could use {{TermAutomatonQuery}} ... but we should tackle that separately.

At long last, this (along with using {{SynonymGraphFilter}} at search time) finally fixes
the long-standing bugs around multi-token synonyms, e.g. LUCENE-4499, LUCENE-1622, https://lucidworks.com/blog/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter
...

This will also be useful for other tokenizers/token filters as well, e.g. I'm working on having
{{WordDelimiterFilter}} set position length correctly and Kuromoji ({{JapaneseTokenizer}})
already produces graph tokens.

> Support Graph Token Streams in QueryBuilder
> -------------------------------------------
>
>                 Key: LUCENE-7603
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7603
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/queryparser, core/search
>            Reporter: Matt Weber
>
> With [LUCENE-6664|https://issues.apache.org/jira/browse/LUCENE-6664] we can use multi-term
synonyms query time.  A "graph token stream" will be created which which is nothing more than
using the position length attribute on stacked tokens to indicate how many positions a token
should span.  Currently the position length attribute on tokens is ignored during query parsing.
 This issue will add support for handling these graph token streams inside the QueryBuilder
utility class used by query parsers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message