lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Mark Nemeskey (JIRA)" <j...@apache.org>
Subject [jira] [Created] (LUCENE-3174) Similarity.Stats class for term & collection statistics
Date Sun, 05 Jun 2011 15:23:47 GMT
Similarity.Stats class for term & collection statistics
-------------------------------------------------------

                 Key: LUCENE-3174
                 URL: https://issues.apache.org/jira/browse/LUCENE-3174
             Project: Lucene - Java
          Issue Type: Sub-task
          Components: core/search
    Affects Versions: flexscoring branch
            Reporter: David Mark Nemeskey
            Assignee: David Mark Nemeskey
            Priority: Minor


In order to support ranking methods besides TF-IDF, we need to make the statistics they need
available. These statistics could be computed in computeWeight (soon to become computeStats)
and stored in a separate object for easy access. Since this object will be used solely by
subclasses of Similarity, it should be implented as a static inner class, i.e. Similarity.Stats.

There are two ways this could be implemented:
- as a single Similarity.Stats class, reused by all ranking algorithms. In this case, this
class would have a member field for all statistics;
- as a hierarchy of Stats classes, one for each ranking algorithm. Each subclass would define
only the statistics needed for the ranking algorithm.

In the second case, the Stats class in DefaultSimilarity would have a single field, idf, while
the one in e.g. BM25Similarity would have idf and average field/document length.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message