lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Luis Neves <>
Subject Re: result grouping?
Date Fri, 05 Jan 2007 11:28:52 GMT
Yonik Seeley wrote:

> There are still some things underspecified though.
> Let's take an example of collapseon=site, collapsenum=2
> The list of un-collapsed matches and their relevancy scores (sort order) 
> is:
> doc=51, site=A, score=100
> doc=52, site=B, score=90
> doc=53, site=C, score=80
> doc=54, site=B, score=70
> doc=55, site=D, score=60
> doc=56, site=E, score=50
> doc=57, site=B, score=40
> doc=58, site=A, score=30
> 1)  If I ask for the top 4 docs, should I get [51,52,53,54] or
> [51,52,54,53].  Are lower ranking docs moved up in the rankings to be
> in their higher ranking "group"?

The docs move up the ranking.
You should get [51,58,52,54] ... or one could make the case that you should get
[51,58,52,54,53,55], to get the somewhat equivalent behaviour of a SQL 
"quota-query", in that case that case the "top 4" would not refer to the number 
of documents but the number of distinct values for the field you are collapsing.

> 2)  If I ask for the top 3 docs, should I get [51,52,53] because those
> are the top 3 scoring docs, or should I get [51,58,52] because
> documents were first groups and then ranked (and 51 and 58 go
> together)?  Another way of asking this is related to (1): should docs
> outside the "window" be moved up in the rankings to be in their higher
> ranking "group"?

See above.

> 3) Should the number of documents in a "group" change the relevancy?
> Should site=B rank higher than site=A?

I don't think so... don't know if that is what *should* be done, but that's not 
what FAST does.

> 4) Is the collapsing only in the returned results, or just within a
> page of results.  If I ask for docs 4 through 7, should doc 57 be in
> that list or not?

With "FAST" that is an option, the default behaviour is to remove the documents 
from the resultset and the 57 would not be on the list, but you can choose to 
not remove them and in that case they are presented last.

> Defining things to make sense while retaining the ability to page
> through the results seems to be the challenge.

I'm beginning to think that this a little to complex for a first project with 
Lucene. In my particular case all I want is to group results by category (from a 
predetermined - and small - category list), I think I will just make a request 
by category and accept the latency.

Luis Neves

View raw message