lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <j...@apache.org>
Subject [jira] [Created] (LUCENE-8025) compute avgdl correctly for DOCS_ONLY
Date Tue, 31 Oct 2017 02:27:00 GMT
Robert Muir created LUCENE-8025:
-----------------------------------

             Summary: compute avgdl correctly for DOCS_ONLY
                 Key: LUCENE-8025
                 URL: https://issues.apache.org/jira/browse/LUCENE-8025
             Project: Lucene - Core
          Issue Type: Bug
            Reporter: Robert Muir


Spinoff of LUCENE-8007:

If you omit term frequencies, we should score as if all tf values were 1. This is the way
it worked for e.g. ClassicSimilarity and you can understand how it degrades. 

However for sims such as BM25, we bail out on computing avg doclength (and just return a bogus
value of 1) today, screwing up stuff related to length normalization too, which is separate.

Instead of a bogus value, we should substitute sumDocFreq for sumTotalTermFreq (all postings
have freq of 1, since you omitted them).




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message