lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Luis Neves <luis.ne...@co.sapo.pt>
Subject Re: result grouping?
Date Thu, 04 Jan 2007 15:15:23 GMT
Yonik Seeley wrote:
> On 1/3/07, Ryan McKinley <ryantxu@gmail.com> wrote:
>> thanks.  Yes, the presentation layer could group results, but that is
>> not practical if i want to show the first 20 results out of 200,000
>> matches.
>>
>> Nutch groups the results by site.  Any idea how they do it?
> 
> Good question.
> Off the top of my head, one could use a priority queue that can change
> it's size dynamically.  One could increment a group count for each hit
> (like faceted search with the FieldCache) and if the group count
> exceeds "n", then you increment the size of the priority queue to
> allow an additional item to be collected to compensate.
> 
> -Yonik

You might as wheel say that I have to change the dilithium crystals in the flux 
capacitor :-)

One of the reasons I like Solr so much is because I get impressive results 
without having to know Lucene, which is something that will have to change 
because I also need this feature.

Not knowing much about the internal of Solr/Lucene I had a look at the Facet 
code in search of ideas, but from what I could see the facet counts are 
calculated after the Documents are added to the response, it seems to me that 
any kind of grouping has to be done before that... right?

Could you explain in more detail where should I look?

Can the TopFieldDocCollector/TopFieldDocs classes be used to this end?

I'm immersing my self on Lucene but it will take some time.

Side note: Over here, beside Solr, we also use the "FAST" search platform and 
they call this feature "Field collapsing":
<http://www.fastsearch.com/glossary.aspx?m=48&amid=299>
I like the syntax they use:
"&collapseon=<fieldname>&collapsenum=N" -> Collapse, but keep N number of

collapsed documents
For some reason they can only collapse on numeric fields (int32).

Regards,
Luis Neves


Mime
View raw message