lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Aglassinger <t.aglassin...@netconomy.net>
Subject Inconsistent debugQuery score with multiplicative boost
Date Fri, 04 Jan 2019 08:11:33 GMT
Hi!

When debugging a query using multiplicative boost based on the product() function I noticed
that the score computed in the explain section is correct while the score in the actual result
is wrong.

As an example here’s a simple query that boosts a field name_text_de (containing German
product names). The term “Netzteil” boost to 200% and “Sony” boosts to 300%. A name
that contains both terms would be boosted to 600%. If a term does not match, a default pseudo
boost of 1 is used (multiplicative identity). The params of the responseHeader in the query
result are:

"q":"{!boost b=$ymb}(+{!lucene v=$yq})",
"ymb":"product(query({!v=\"name_text_de\\:Netzteil\\^=2.0\"},1),query({!v=\"name_text_de\\:Sony\\^=3.0\"},1))",
"yq":"*:*",

The parsed query of the ymb parameter translates to:

FunctionScoreQuery(FunctionScoreQuery(+*:*, scored by boost(product(query((ConstantScore(name_text_de:netzteil))^2.0,def=1.0),query((ConstantScore(name_text_de:sony))^3.0,def=1.0)))))

For a product that contains both terms, the score in the result and explain section correctly
yields 6.0:

"name_text_de":"Original Sony Vaio Netzteil",
"score":6.0,

6.0 = product of:
  1.0 = boost
  6.0 = product of:
    1.0 = *:*
    6.0 = product(query((ConstantScore(name_text_de:netzteil))^2.0,def=1.0)=2.0,query((ConstantScore(name_text_de:sony))^3.0,def=1.0)=3.0)

However, for a product with only “Netzteil” in the name, the result score wrongly is 1.0
while the explain score correctly is 2.0:

"name_text_de":"GS-Netzteil 20W schwarz",
"score":1.0,

2.0 = product of:
  1.0 = boost
  2.0 = product of:
    1.0 = *:*
    2.0 = product(query((ConstantScore(name_text_de:netzteil))^2.0,def=1.0)=2.0,query((ConstantScore(name_text_de:sony))^3.0,def=1.0)=1.0)

(Note: the filter chain splits words on hyphen so the “GS-“ in front of the “Netzteil”
should not be an issue.)

Here’s the complete filter chain for the text_de field type:

<fieldType name="text_de" class="solr.TextField" positionIncrementGap="100">
    <analyzer>
        <tokenizer class="solr.WhitespaceTokenizerFactory" />
        <filter class="solr.ManagedSynonymGraphFilterFactory" managed="de" />
        <filter class="solr.ManagedStopFilterFactory" managed="de" />
        <filter class="solr.WordDelimiterGraphFilterFactory"  preserveOriginal="1"
                generateWordParts="1" generateNumberParts="1" catenateWords="1"
                catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" />
        <filter class="solr.LowerCaseFilterFactory" />
        <filter class="solr.ASCIIFoldingFilterFactory" />
        <filter class="solr.GermanStemFilterFactory" />
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
    </analyzer>
</fieldType>

Interestingly if I simplify the query to only boost on “Netzteil”, the score in both the
result and explain section are correctly 2.0.

I reproduced this with a local Solr 7.5.0 server (no sharding, no replica) on Mac OS X 10.14.1.

I found mention of a somewhat similar situation with BooleanQuery, which was considered a
bug and fixed in 2016: https://issues.apache.org/jira/browse/LUCENE-7132

So my questions are:

1. Is there something wrong in my query that prevents the “Netzteil”-only product to get
a score of 2.0?
2. Shouldn’t the score in the result and the explain section always be the same?

Best regards,
Thomas
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message