lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: question about Scorer.freq()
Date Mon, 04 Oct 2010 17:12:46 GMT
On Mon, Oct 4, 2010 at 7:32 AM, Koji Sekiguchi <> wrote:
> Hi Mike,
>> Hmm are you only gathering the MUST_NOT TermScorers?  (In which case
>> I'd expect that the .docID() would not match the docID being
>> collected).  Or do you also see .docID() not matching for SHOULD and
>> MUST sub queries?
> The snippet I copy-n-paste at previous mail was not appropriate.
> Sorry for confusing you. Please see the whole program attached
> in this mail.


>> Also, are you sure you are getting BooleanScorer2?
> Yes and no. I confirmed that I got BooleanScorer2 in my setScorer(),
> but as I said I'm interested in TermScorer rather than BooleanScorer2
> because I want to know which field a match occurred. Or am I missing
> something here?

Got it.  No, that's correct.  Top scorer is BS2 and the subs are TS
and so you should interact directly w/ the TSs to find the field/freq
for current doc/etc.

>> And, yes, you should be able to get which field a match occurred in,
>> because at the lowest level the atomic (TermQuery, PhraseQuery,
>> SpanTermQuery, AtomatonQuery, etc.) all operate on a single field.  So
>> when you find a sub that "matches", you should just check the field of
>> that query.
> I wanted it but docId() from sub scorers didn't match...


>> Hmm... but not all queries make it easy/possible to get the field
>> right?  MultiTermQuery has getField, TermQuery has getTerm, but
>> PhraseQuery doesn't have a .getField (oh but you can .getTerms() and
>> then get the field).
> I agree, though for simple PoC, I'm interested in TermQuery in the
> following program.


So..... this looks like an issue with how DisjunctionSumScorer works.
Apparently, and unfortunately, it advances the sub scorers beyond the
current doc, and also aggregates the scores.  (ConjunctionScorer and
ReqOptScorer seem to work correctly -- they leave the subs at the
current doc, and, delay calling .score() on the subs until the
collector asks for it.)

I'll open an issue for this.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message