lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <>
Subject [jira] [Commented] (LUCENE-6276) Add matchCost() api to TwoPhaseDocIdSetIterator
Date Sat, 21 Feb 2015 19:55:12 GMT


Robert Muir commented on LUCENE-6276:

I'm curious if you already have concrete ideas for the match costs of our existing queries?

See above in the description. we know the average number of positions per doc (totalTermFreq/docFreq)
and so on. So we can compute the amortized cost of reading one position, and its easy from

Maybe it should not only measure the cost of the operation but also how likely it is to match?

I don't agree. You can already get this with Scorer.getApproximation().cost()/Scorer.cost().

> Add matchCost() api to TwoPhaseDocIdSetIterator
> -----------------------------------------------
>                 Key: LUCENE-6276
>                 URL:
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Robert Muir
> We could add a method like TwoPhaseDISI.matchCost() defined as something like estimate
of nanoseconds or similar. 
> ConjunctionScorer could use this method to sort its 'twoPhaseIterators' array so that
cheaper ones are called first. Today it has no idea if one scorer is a simple phrase scorer
on a short field vs another that might do some geo calculation or more expensive stuff.
> PhraseScorers could implement this based on index statistics (e.g. totalTermFreq/maxDoc)

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message