lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jim Ferenczi (JIRA)" <>
Subject [jira] [Commented] (LUCENE-7893) SynonymGraphFilterFactory proximity search error
Date Thu, 06 Jul 2017 23:38:01 GMT


Jim Ferenczi commented on LUCENE-7893:

Ok I checked the Solr code and the problem is that the slop is only applied to phrase and
multi_phrase query. 
This specific issue is in SolrQueryParser#getFieldQuery which applies the phrase slop to phrase
and multi_phrase query but not on SpanNearQuery.
SpanNearQuery are used when the analysis creates a graph token stream from the quoted terms.
[~diogoedl] are you interested in providing a patch for Solr ?

> SynonymGraphFilterFactory proximity search error
> ------------------------------------------------
>                 Key: LUCENE-7893
>                 URL:
>             Project: Lucene - Core
>          Issue Type: Bug
>    Affects Versions: 6.6
>            Reporter: Diogo Guilherme Leão Edelmuth
> There seems to be an issue when doing proximity searches that include terms that have
multi-word synonyms.
> Example:
> consider there's is configured in synonyms.txt
> (
> grand mother, grandmother
> grandfather, granddad
> )
> and there's an indexed field with: (My mother and my grandmother went...)
> Proximity search with: ("mother grandmother"~8)
> won't return the file, while ("father grandfather"~8) does return the analogous file.
> I am not a developer of Solr, so pardon if I am wrong, but I ran it with debug=query
and saw that when proximity searches are done with multi-term synonyms, the called function
is spanNearQuery: 
> "parsedquery":"SpanNearQuery(spanNear([laudo:mother,
> spanOr([laudo:grand mother, laudo:grandmother])],*0*, true))"
> while proximity searches with one-term synonyms are executed with:
> "MultiPhraseQuery(laudo:\"father (grandfather granddad)\"~10)"
> Note that the SpanNearQuery is called with a slope parameter of 0, no matter what is
passed after the tilde. So if I search the exact phrase it does match.
> Here is my field-type, just in case:
> <fieldType name="text_pt_synonyms_ascii_minimal_lightStem" class="solr.TextField"
>     <analyzer type="index">
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.StopFilterFactory" format="snowball" words="lang/stopwords_pt.txt"
>         <filter class="solr.PortugueseLightStemFilterFactory"/>
> </analyzer>
>     <analyzer type="query">
>         <tokenizer class="solr.StandardTokenizerFactory"/><filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.StopFilterFactory" format="snowball" words="lang/stopwords_pt.txt"
ignoreCase="true"/><filter class="solr.ASCIIFoldingFilterFactory" preserveOriginal="true"/>
>         <filter class="solr.SynonymGraphFilterFactory" expand="true" ignoreCase="true"
>         <filter class="solr.PortugueseLightStemFilterFactory"/>
> </analyzer>
> </fieldType>

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message