lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-3381) Sandbox remaining contrib queries
Date Thu, 18 Aug 2011 09:53:28 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13086920#comment-13086920
] 

Robert Muir commented on LUCENE-3381:
-------------------------------------

{quote}
To work across different scoring systems generically I expect IDF-tweakage would need to be
made a pluggable aspect of all these scoring strategies e.g. through a common interface. Messy.
{quote}

I don't think think we need to do that?
I added a comment to the source code:
{noformat}
    // TODO: generalize this query (at least it should not reuse this static sim!
    // a better way might be to convert this into multitermquery rewrite methods.
    // the rewrite method can 'average' the TermContext's term statistics (docfreq,totalTermFreq)

    // provided to TermQuery, so that the general idea is agnostic to any scoring system...
{noformat}

I don't think this is really that hard, nor messy? 
Then this Query just invokes rewrite() to a BooleanQuery of ordinary fuzzyqueries, setting
its custom rewrite methods (it looks like we need to implement 2 here, depending upon configuration)
on each.

The rewrite methods would average docfreq and totaltermfreq (the only two "collection-wide"
term statistics lucene supports), and set these in the TermContexts that they pass to TermQuery.
Then the concept works for all scoring systems.

As a side benefit, this would give some performance benefits anyway since by doing this, the
term rewrite will become single pass instead of doing wasted seeks per-segment * per-term.


> Sandbox remaining contrib queries
> ---------------------------------
>
>                 Key: LUCENE-3381
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3381
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Chris Male
>         Attachments: LUCENE-3381.patch
>
>
> In LUCENE-3271, I moved the 'good' queries from the queries contrib to new destinations
(primarily the queries module).  The remnants now need to find their home.  As suggested in
LUCENE-3271, these classes are not bad per se, just odd.  So lets create a sandbox contrib
that they and other 'odd' contrib classes can go to.  We can then decide their fate at another
time.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message