lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wouter Admiraal <...@wadmiraal.net>
Subject Re: When using Dismax, Solr 5.1 tries to compare the entire field to the search string, instead of only using keywords
Date Thu, 04 Jun 2015 14:03:10 GMT
Hi, thanks for the response.

Label field:
<field name="label" type="text" indexed="true" stored="true"
termVectors="true" omitNorms="true"/>

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
    <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="txt/stopwords.txt" />
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"
preserveOriginal="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
        <filter class="solr.NGramFilterFactory" minGramSize="3"
maxGramSize="25"/>
    </analyzer>
    <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory"
synonyms="txt/synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="txt/stopwords.txt" />
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"
preserveOriginal="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
</fieldType>

I can surely optimize the above config a bit, maybe only use one
<analyzer> for both query and index. But for now, this is what it
does.

Just as a side-question: is dismax *supposed* to match fields exactly
with the search query? Or is my expectation correct, meaning it should
"tokenize" the field, just as with regular searches? It just doesn't
seem intuitive to me.

Thank you again for your help.

Kind regards,
Wouter Admiraal


2015-06-04 14:52 GMT+02:00 Shawn Heisey <apache@elyograg.org>:
> On 6/4/2015 1:22 AM, Wouter Admiraal wrote:
>> When I turn on debug, I get the following:
>>
>> "debug": {
>>   "rawquerystring": "Food",
>>   "querystring": "Food",
>>   "parsedquery": "(+DisjunctionMaxQuery((label:Food^3.0)) ())/no_coord",
>>   "parsedquery_toString": "+(label:Food^3.0) ()",
>>   "explain": {},
>>   "QParser": "DisMaxQParser",
>>   "altquerystring": null,
>>   "boostfuncs": null,
>>   ...
>> }
>>
>> I don't understand how/why this doesn't use a "contains" operator.
>> This was the behavior on the old 1.4 instance. I went through the
>> changelog for 1.4 to 5.1, but I don't find any explicit information
>> about dismax behaving differently, except the "mm" parameter needs a
>> default. I tried many values for mm (including 0, 100%, 100, etc) but
>> to no avail.
>
> In your schema.xml, what is the definition of the label field, and the
> fieldType definition of the type used in the label field?  That will
> determine exactly how the query is parsed and whether individual words
> will match.  I wasn't using dismax or edismax back when I was running
> 1.4, so I can't say anything about how it used to work, only how it
> works now.
>
> Thanks,
> Shawn
>

Mime
View raw message