What do you mean "we are not inlining"? The compiler inlines methods .. at least it tries.

Shai

On Tue, Jun 8, 2010 at 8:21 PM, John Wang <john.wang@gmail.com> wrote:
Shai:

    method call overhead in this case is not insignificant because it is in a very tight loop, and no, compiler cannot optimize it for you, we are not inline-ing cuz we are in a java world.

     You are right, this breaks backward compatibility. But from 2.4 - 2.9, we have done MUCH worse. :)

-John


On Tue, Jun 8, 2010 at 10:09 AM, Shai Erera <serera@gmail.com> wrote:
I guess I must be missing something fundamental here :).

If Scorer is defined as you propose, and I create my Scorer which impls getDISI() as "return this" - what do I lose? What's wrong w/ Scorer already being a DISI?

You mention "it is just inefficient to pay the method call overhead ..." - what overhead? Are you talking about the decorator delegating the call to the wrapped scorer? I really think the compiler can handle that, no? Especially if you make your nextDoc/advance final (which probably you should) ...

That doesn't seem to justify an API change, break bw completely (even if we do it in 4.0 only) and change all the current Scorers ...

Shai


On Tue, Jun 8, 2010 at 8:01 PM, John Wang <john.wang@gmail.com> wrote:
re: But Scorer is itself an iterator, so what prevents you from calling

nextDoc and advance on it without score()

Nothing. It is just inefficient to pay the method call overhead just to overload score.

re: If I were in your shoes, I'd simply provider a Query wrapper. If CSQ

is not good enough I'd just develop my own.

That is what I am doing. I am just proposing the change (see my first email) as an improvement.

re: Scorer is itself an iterator

yes, that is the current definition. The point of the proposal is to make this change.

-John

On Tue, Jun 8, 2010 at 9:45 AM, Shai Erera <serera@gmail.com> wrote:
Well … I don't know the reason as well and always thought Scorer and
Similarity are confusing.

But Scorer is itself an iterator, so what prevents you from calling
nextDoc and advance on it without score(). And what would the returned
DISI do when nextDoc is called, if not delegate to its subs?

If I were in your shoes, I'd simply provider a Query wrapper. If CSQ
is not good enough I'd just develop my own.

But perhaps others think differently?

Shai

On Tuesday, June 8, 2010, John Wang <john.wang@gmail.com> wrote:
> Hi Shai:
>     I am not sure I understand how changing Similarity would solve this problem, wouldn't you need the reader?
>     As for PayloadTermQuery, payload is not always the most efficient way of storing such data, especially when number of terms << numdocs. (I am not sure accessing the payload when you iterate is a good idea, but that is another discussion)
>
>     Yes, what I described is exactly a simple CustomScoreQuery for a special use-case. The problem is also in CustomScoreQuery, where nextDoc and advance are calling the sub-scorers as a wrapper. This can be avoided if the Scorer returns an iterator instead.
>
>     Separating scoring and doc iteration is a good idea anyway. I don't know the reason to combine them originally.
> Thanks
> -John
>
>
> On Tue, Jun 8, 2010 at 8:47 AM, Shai Erera <serera@gmail.com> wrote:
>
> So wouldn't it make sense to add some method to Similarity? Which receives the doc Id in question maybe ... just thinking here.
>
> Factoring Scorer like you propose would create 3 objects for scoring/iterating: Scorer (which really becomes an iterator), Similarity and CustomScoreFunction ...
>
> Maybe you can use CustomScoreQuery? or PayloadTermQuery? depends how you compute your age decay function (where you pull the data about the age of the document).
>
> Shai
>
>
> On Tue, Jun 8, 2010 at 6:41 PM, John Wang <john.wang@gmail.com> wrote:
> Hi Shai:
>     Similarity in many cases is not sufficient for scoring. For example, to implement age decaying of a document (very useful for corpuses like news or tweets), you want to project the raw tfidf score onto a time curve, say f(x), to do this, you'd have a custom scorer that decorates the underlying scorer from your say, boolean query:
>
>
>
> public float score(){    return myFunc(innerScorer.score());}
>     This is fine, but then you would have to do this as well:
> public int nextDoc(){
>
>
>    return innerScorer.nextDoc();}
> and also:
> public int advance(int target){   return innerScorer.advance();}      The difference here is that nextDoc and advance are called far more times as score. And you are introducing an extra method call for them, which is not insignificant for queries result in large recall sets.
>
>
>
> Hope this makes sense.
> Thanks
> -John
> On Tue, Jun 8, 2010 at 5:02 AM, Shai Erera <serera@gmail.com> wrote:
> I'm not sure I understand what you mean - Scorer is a DISI itself, and the scoring formula is mostly controlled by Similarity.
>
> What will be the benefits of the proposed change?
>
> Shai
>
> On Tue, Jun 8, 2010 at 8:25 AM, John Wang <john.wang@gmail.com> wrote:
>
>
>
>
> Hi guys:
>
>     I'd like to make a proposal to change the Scorer class/api to the following:
>
>
> public abstract class Scorer{
>    DocIdSetIterator getDocIDSetIterator();
>    float score(int docid);
> }
>
> Reasons:
>
> 1) To build a Scorer from an existing Scorer (e.g. that produces raw scores from tfidf), one would decorate it, and it would introduce overhead (in function calls) around nextDoc and advance, even if you just want to augment the score method which is called much fewer times.
>
> 2) The current contract forces scoring on the currentDoc in the underlying iterator. So once you pass "current", you can no longer score. In one of our use-cases, it is very inconvenient.
>
> What do you think? I can go ahead and open an issue and work on a patch if I get some agreement.
>
> Thanks
>
> -John
>
>
>
>
>
>
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org