lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-4100) Maxscore - Efficient Scoring
Date Fri, 13 Oct 2017 15:22:00 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-4100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16203693#comment-16203693
] 

Robert Muir commented on LUCENE-4100:
-------------------------------------

yeah, I think i am looking at it from the top-down (indexsearcher) vs bottom up (queries).

indexsearcher already knows if scores are needed (e.g. from the Sort), but there is no way
to tell it that approximate total hit count is acceptable. If we can do that, then I think
we can make the early termination case really easy for the sorted case, index order case,
and also this maxscore case.

{quote}
Ideally we would not need another parameter on Query.createWeight for MAXSCORE either, but
the issue is that depending on whether you want to collect all hits or only the top-scoring
ones, then we need different Scorer impls.
{quote}

we do? (genuine question). We added needsScores because previously scorers had to always be
ready for you to "lazily" call score(), and this prevented scoring from doing much more interesting
things up-front like caching whole bitsets, but is it really the case for maxScore? I'm just
asking because the new scorer here looks a hell of a lot like a disjunction scorer :) If we
truly need a different impl, we should maybe still think it thru because of stuff like setMinCompetitiveScore()
method, which would make no sense except for that case. I do like that in your patch AssertingScorer
checks all that stuff, but there it is a bit confusing.


> Maxscore - Efficient Scoring
> ----------------------------
>
>                 Key: LUCENE-4100
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4100
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/codecs, core/query/scoring, core/search
>    Affects Versions: 4.0-ALPHA
>            Reporter: Stefan Pohl
>              Labels: api-change, gsoc2014, patch, performance
>             Fix For: 4.9, 6.0
>
>         Attachments: LUCENE-4100.patch, LUCENE-4100.patch, contrib_maxscore.tgz, maxscore.patch
>
>
> At Berlin Buzzwords 2012, I will be presenting 'maxscore', an efficient algorithm first
published in the IR domain in 1995 by H. Turtle & J. Flood, that I find deserves more
attention among Lucene users (and developers).
> I implemented a proof of concept and did some performance measurements with example queries
and lucenebench, the package of Mike McCandless, resulting in very significant speedups.
> This ticket is to get started the discussion on including the implementation into Lucene's
codebase. Because the technique requires awareness about it from the Lucene user/developer,
it seems best to become a contrib/module package so that it consciously can be chosen to be
used.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message