lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: Sorting by Score
Date Wed, 28 Feb 2007 13:24:37 GMT
Empirically, when I insert the elements in the FieldSortedHitQueue
they get sorted according to the Sort object. The original query
that gives me a TopDocs applied
no secondary sorting, only relevancy. Since I normalized
all the scores into one of only 5 discrete values, and secondary
sorting was applied to all docs with the same score when I inserted
them in the FieldSortedHitQueue.

Now popping things of the FieldSortedHitQueue is ordered the
way I want.

You could just operate on the FieldSortedHitQueue at this point, but
I decided the rest of my code would be simpler if I stuffed them back
into the TopDocs, so there's some explanation below that you can
just skip if I've cleared things up already.....

*****************
The step I left out is moving the documents from the
FIeldSortedHitQueue back to topDocs.scoreDocs.
So the steps are as follows..

1> "bucketize" the scores. That is, go through the
TopDocs.scoreDocs and adjust each raw score into
one of my buckets. This is made easy by the
existence of topDocs.getMaxScore. TopDocs has
had no sorting other than relevancy applied so far.

2> assemble the FieldSortedHitQueue by inserting
each element from scoreDocs into it, with a suitable
Sort object, relevance is the first field (SortField.FIELD_SCORE).

3> pop the entries off the FieldSortedHitQueue, overwriting
the elements in topDocs.scoreDocs.

I left out step <3>, although I suppose you could
operate directly on the FieldSortedHitQueue.

NOTE: in my case, I just put everything back in the
scoreDocs without attempting any efficiencies. If I
needed more performance, I'd only put as many items
back as I needed to display. But as I wrote yesterday,
performance isn't an issue so there's no point. Although
I know one place to look if we need to squeeze more QPS.

How efficient this is is an open question. But it's "fast enough"
and relatively simple so I stopped looking for more
efficiencies....

Erick

On 2/28/07, Chris Hostetter <hossman_lucene@fucit.org> wrote:
>
>
> : The first part was just to iterate through the TopDocs that's available
> to
> : my and normalize the scores right in the ScoreDocs. Like this...
>
> Won't that be done after the Lucene does the hitcollecting/sorting? ... he
> wants the "bucketing" to happen as part of hte scoring so that the
> secondary sort will determine the ordering within the bucket.
>
> (or am i missing something about your description?)
>
>
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message