lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: Proposal: Scorer api change
Date Wed, 09 Jun 2010 10:02:20 GMT
I generally don't trust the compiler, if/when I have that freedom.

If you can fix a hotspot in Lucene to avoid an extra method call, an
extra add/multiply, etc., you should.  Doing so ensures the cost can't
be there.  Not doing so means you rely on the JRE to be smart enough,
and it very easily may not be (there are so many variables), and that
also makes Lucene's performance more fragile/env-specific.

Why take that chance?

I also don't rely on benchmarks to validate this on a case-by-case
basis; the cost for any single change (like this one) can easily be in
the noise, yet these micro-costs do add up.

Different rules apply "down low".  It's like quantum physics!

I think, besides avoiding method calls, there are compelling reasons
to consider a stronger decoupling of matching & scoring.  A Query
really ought to be two separable things -- matching (like Filter) and

EG DisjunctionMaxQuery has its own matching code that duplicates what
BooleanQuery does if the query is all SHOULD clauses.  Why duplicate
this code?  Why restrict the "max score of all subs = doc's score" to
only SHOULD-only BooleanQueries?  If we had full matching/scoring
decoupling, we wouldn't have to.

Or, eg the BM25 patch (LUCENE-2091) had to create its own
BM25BooleanQuery to do matching & scoring, which is silly -- if it's
only changing how scoring works, it should be able to reuse the
existing matching code in BooleanQuery.

That said, there are challenges; eg the higher performance
BooleanScorer (which scores docs in "chunks" and is free to collect
them out-of-order) would be challenging to fully decouple from scoring
since it's not strictly "doc-at-once".

On the other part of the proposal (allowing .score() to take an
arbitrary docID), that does sound like a can of worms.  MG4J's model
(scorer receives the full "state" of the matcher and can peek in as
necessary) sounds compelling...


On Wed, Jun 9, 2010 at 3:35 AM, Earwin Burrfoot <> wrote:
> Lies, lies, lies :)
> I mean, Sun JIT is overrelied on. Especially in regards to inlining.
> But, there are some cases when you can trust it. I.e. if you call a
> virtual method and this exact call-site gets refs to different objects
> at runtime (meaning here - you wrap different Queries in your
> WrapperQuery) - you can definetly rely on a call not being inlined.
> So, I agree with John on his /rough/ overhead estimates, on the part
> that it exists, and it's detectable. I don't agree on allowing
> arbitrary doc scoring. People who really need this for some strange
> applications, can emulate this now - by advancing() scorer to needed
> doc, and calling score(). But for most people it's unnecessary, and as
> I said - will lead to scaaary code.
> If you really think that one or two method calls in a loop are
> neglible, I ask you to join my holy crusade and erase
> Scorer.score(Collector) set of methods :) they exist there for the
> sole purporse of cutting on a few method calls, and are really,
> really, really confusing.
> 2010/6/9 Shai Erera <>:
>> I don't think the method call is an overhead John. You don't need to
>> reiterate it. The compiler does make optimizations and inlines such
>> code/calls if it can. More than that, the query processing involves so much
>> method calls, that I do think that's insignificant.
> Woohoo! Mexican standoff! :)
> --
> Kirill Zakharenko/Кирилл Захаренко (
> Phone: +7 (495) 683-567-4
> ICQ: 104465785
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message