lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Spam <ps...@mac.com>
Subject Re: Tips for getting unique results?
Date Thu, 07 Apr 2011 16:54:12 GMT
The data are fine and not duplicated - however, I want to analyze the data, and summarize one
field (kind of like faceting), to understand what the largest value is.

For example:

Document 1:   label=1A1A1; body="adfasdfadsfasf"
Document 2:   label=5A1B1; body="adfaasdfasdfsdfadsfasf"
Document 3:   label=1A1A1; body="adasdfasdfasdffaasdfasdfsdfadsfasf"
Document 4:   label=7A1A1; body="azxzxcvdfaasdfasdfsdfadsfasf"
Document 5:   label=7A1A1; body="azxzxcvdfaasdfasdfsdasdaaaaafadsfasf"
Document 6:   label=5A1B1; body="adfaasdfasdfsdfadsfasfzzz"

How do I get back just ONE of the largest "label" item?

In other words, what query will return the 7A1A1 label just once?  If I search for q=* and
sort the results, it works, except I get back multiple hits for each label.  If I do a facet,
I can only sort by increasing order, when what I want is decreasing order.


-Pete
 
On Apr 6, 2011, at 10:22 PM, Otis Gospodnetic wrote:

> Hi,
> 
> I think you are saying dupes are the main problem?  If so, 
> http://wiki.apache.org/solr/Deduplication ?
> 
> Otis
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
> 
> 
> 
> ----- Original Message ----
>> From: Peter Spam <pspam@mac.com>
>> To: solr-user@lucene.apache.org
>> Sent: Thu, April 7, 2011 1:13:44 AM
>> Subject: Tips for getting unique results?
>> 
>> Hi,
>> 
>> I have documents with a field that has "1A2B3C" alphanumeric  characters.  I 
>> can query for * and sort results based on this field,  however I'd like to 
>> "uniq" these results (remove duplicates) so that I can get  the 5 largest unique

>> values.  I can't use the StatsComponent because my  values have letters in them 
>> too.
>> 
>> Faceting (and ignoring the counts) gets  me half of the way there, but I can 
>> only sort ascending.  If I could also  sort facet results descending, I'd be 
>> done.  I'd rather not return all  documents and just parse the last few results 
>> to work around this.
>> 
>> Any  ideas?
>> 
>> 
>> -Pete
>> 


Mime
View raw message