lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Markus Jelsma <markus.jel...@openindex.io>
Subject BlendedTermQuery causing negative IDF?
Date Tue, 19 Apr 2016 13:48:06 GMT
Hello,

I just made a Solr query parser for BlendedTermQuery on Lucene 6.0 using BM25 similarity and
i have a very simple unit test to see if something is working at all. But to my surprise,
one of the results has a negative score, caused by a negative IDF because docFreq is higher
than docCount for that term on that field. Here are the test documents:

    assertU(adoc("id", "1", "text", "rare term"));
    assertU(adoc("id", "2", "text_nl", "less rare term"));
    assertU(adoc("id", "3", "text_nl", "rarest term"));
    assertU(commit());

My query parser creates the following Lucene query: BlendedTermQuery(Blended(text:rare text:term
text_nl:rare text_nl:term)) which looks fine to me. But this is what i am getting back for
issueing that query on the above set of documents, the third document is the one with a negative
score.

<result name="response" numFound="3" start="0" maxScore="0.1805489">
  <doc>
    <str name="id">3</str>
    <float name="score">0.1805489</float></doc>
  <doc>
    <str name="id">2</str>
    <float name="score">0.14785346</float></doc>
  <doc>
    <str name="id">1</str>
    <float name="score">-0.004004207</float></doc>
</result>
<lst name="debug">
  <str name="rawquerystring">{!blended fl=text,text_nl}rare term</str>
  <str name="querystring">{!blended fl=text,text_nl}rare term</str>
  <str name="parsedquery">BlendedTermQuery(Blended(text:rare text:term text_nl:rare
text_nl:term))</str>
  <str name="parsedquery_toString">Blended(text:rare text:term text_nl:rare text_nl:term)</str>
  <lst name="explain">
    <str name="3">
0.1805489 = max plus 0.01 times others of:
  0.1805489 = weight(text_nl:term in 2) [], result of:
    0.1805489 = score(doc=2,freq=1.0 = termFreq=1.0
), product of:
      0.18232156 = idf(docFreq=2, docCount=2)
      0.9902773 = tfNorm, computed from:
        1.0 = termFreq=1.0
        1.2 = parameter k1
        0.75 = parameter b
        2.5 = avgFieldLength
        2.56 = fieldLength
</str>
    <str name="2">
0.14785345 = max plus 0.01 times others of:
  0.14638956 = weight(text_nl:rare in 1) [], result of:
    0.14638956 = score(doc=1,freq=1.0 = termFreq=1.0
), product of:
      0.18232156 = idf(docFreq=2, docCount=2)
      0.8029196 = tfNorm, computed from:
        1.0 = termFreq=1.0
        1.2 = parameter k1
        0.75 = parameter b
        2.5 = avgFieldLength
        4.0 = fieldLength
  0.14638956 = weight(text_nl:term in 1) [], result of:
    0.14638956 = score(doc=1,freq=1.0 = termFreq=1.0
), product of:
      0.18232156 = idf(docFreq=2, docCount=2)
      0.8029196 = tfNorm, computed from:
        1.0 = termFreq=1.0
        1.2 = parameter k1
        0.75 = parameter b
        2.5 = avgFieldLength
        4.0 = fieldLength
</str>
    <str name="1">
-0.004004207 = max plus 0.01 times others of:
  -0.20021036 = weight(text:rare in 0) [], result of:
    -0.20021036 = score(doc=0,freq=1.0 = termFreq=1.0
), product of:
      -0.22314355 = idf(docFreq=2, docCount=1)
      0.89722675 = tfNorm, computed from:
        1.0 = termFreq=1.0
        1.2 = parameter k1
        0.75 = parameter b
        2.0 = avgFieldLength
        2.56 = fieldLength
  -0.20021036 = weight(text:term in 0) [], result of:
    -0.20021036 = score(doc=0,freq=1.0 = termFreq=1.0
), product of:
      -0.22314355 = idf(docFreq=2, docCount=1)
      0.89722675 = tfNorm, computed from:
        1.0 = termFreq=1.0
        1.2 = parameter k1
        0.75 = parameter b
        2.0 = avgFieldLength
        2.56 = fieldLength
</str>

What am i doing wrong? Or did i catch a bug?

Thanks,
Markus

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message