lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ZhongHua Wu (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (LUCENE-8767) DisjunctionMaxQuery do not work well when multiple search term+mm+query fields with different fieldType.
Date Wed, 17 Apr 2019 08:27:00 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-8767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

ZhongHua Wu updated LUCENE-8767:
--------------------------------
    Summary: DisjunctionMaxQuery do not work well when multiple search term+mm+query fields
with different fieldType.  (was: DisjunctionMaxQuery do not work well when multiple search
term+synonyms+mm+query fields with different fieldType.)

> DisjunctionMaxQuery do not work well when multiple search term+mm+query fields with different
fieldType.
> --------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-8767
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8767
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/queryparser
>    Affects Versions: 7.3
>         Environment: Solr: 7.3.1
> Backup:
> FieldType for name field:
> <fieldType name="forName" class="solr.TextField" positionIncrementGap="100" omitNorms="true">
>  <analyzer>
>  <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>  <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true"
/>
>  <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="0"
catenateWords="1" catenateNumbers="1" catenateAll="0" 
>  splitOnCaseChange="0" preserveOriginal="1" splitOnNumerics="0"/>
>  <filter class="solr.LowerCaseFilterFactory"/>
>  <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"
/>
>  <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>  </analyzer>
>  </fieldType>
> FieldType for partNumber field:
> <fieldType name="forPartNumber" class="solr.TextField" sortMissingLast="true" omitNorms="true">
>  <analyzer>
>  <tokenizer class="solr.KeywordTokenizerFactory"/>
>  <filter class="solr.LowerCaseFilterFactory"/>
>  <filter class="solr.TrimFilterFactory" />
>  </analyzer>
>  </fieldType>
>            Reporter: ZhongHua Wu
>            Priority: Critical
>              Labels: patch
>
> When multiple fields in query fields came from different fieldType, especially one from KeywordTokenizerFactory,
another from WhitespaceTokenizerFactory, then the generated parse query could not honor synonyms
and mm, which hit incorrect documents. The following is my detail:
>  # We use Solr 7.3.1
>  # Our qf=name^10 partNumber_ntk, while fieldType of name use solr.WhitespaceTokenizerFactory
and solr.WordDelimiterFilterFactory, while  partNumber_ntk is not tokenized and use solr.KeywordTokenizerFactory
>  # mm=2<3 4<5 6<-80%25
>  # The search term is versatil sundress, while 'versatile' and 'testing' are synonyms,
we have documents named " Versatil Empire Waist Sundress" which should be hit, but failed.
>  # We test same query on Solr 5.5.4, it works fine, it do not work on Solr 7.3.1.
> q=
> (Versatil%20testing)%20sundress&fl=name&defType=edismax&mm=2<3 4<5
6<-80%25&qf=name^10%20partNumber_ntk&debugQuery=true&wt=xml&rows=100
> parsedQuery:
> +(DisjunctionMaxQuery((((name:versatil name:test)~2)^10.0 | partNumber_ntk:versatil testing))
DisjunctionMaxQuery(((name:sundress)^10.0 | partNumber_ntk:sundress)))~2
> Which seems it incorrect parse name to: name:versatil name:test
> If I change the query fields to same fieldType, for example,shortDescription is in same
fieldType of name:
> q=(Versatil%20testing)%20sundress&fl=name&defType=edismax&mm=2<3 4<5
6<-80%25&qf=name^10%20shortDescription&debugQuery=true&wt=xml&rows=100
> ParsedQuery:
> +((DisjunctionMaxQuery(((name:versatil)^10.0 | shortDescription:versatil)) DisjunctionMaxQuery(((name:test)^10.0
| shortDescription:test))) DisjunctionMaxQuery(((name:sundress)^10.0 | shortDescription:sundress)))~2
> which hits correctly.
> Could someone check this or tell us a quick workaround? Now it have big impact on customer.
> Thanks in advance! The following is backup information:
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message