lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-3220) Implement various ranking models as Similarities
Date Wed, 22 Jun 2011 16:16:47 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13053329#comment-13053329
] 

Robert Muir commented on LUCENE-3220:
-------------------------------------

Just took a look, a few things that might help:

* yes the maxdoc does not reflect deletions, but neither does things like totalTermFreq or
docFreq either... so its best to not worry about deletions in the scoring and to be consistent
and use the stats (e.g. maxDoc, not numDocs) that do not take deletions into account.

* for the computeStats(TermContext... termContexts) its wierd to sum the DF across the different
terms in the case? But i don't honestly have any suggestions here... maybe in this case we
should make a EasyPhraseStats that computes the EasyStats for each term, so its not hiding
anything or limiting anyone? and you could then do an instanceof check and have a different
method like scorePhrase() that it forwards to in case its an EasyPhraseStats? In general i'm
not sure how other ranking systems tend to handle this case, the phrase estimation for IDF
in lucene's formula is done by summing the IDFs


> Implement various ranking models as Similarities
> ------------------------------------------------
>
>                 Key: LUCENE-3220
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3220
>             Project: Lucene - Java
>          Issue Type: Sub-task
>          Components: core/search
>    Affects Versions: flexscoring branch
>            Reporter: David Mark Nemeskey
>            Assignee: David Mark Nemeskey
>              Labels: gsoc
>         Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch,
LUCENE-3220.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we can finally
work on implementing the standard ranking models. Currently DFR, BM25 and LM are on the menu.
> TODO:
>  * {{EasyStats}}: contains all statistics that might be relevant for a ranking algorithm
>  * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the DocScorers
and as much implementation detail as possible
>  * _BM25_: the current "mock" implementation might be OK
>  * _LM_
>  * _DFR_
> Done:

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message