lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexandre Rafalovitch <arafa...@gmail.com>
Subject Re: Find duplicates
Date Tue, 02 Dec 2014 16:14:39 GMT
And if I am correct, enabling docValues will do this kind of grouping
as part of the indexing with docValues data structure (per segment).
So, all one has to do is to get it back (through faceting).

Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On 2 December 2014 at 11:02, Erik Hatcher <erik.hatcher@gmail.com> wrote:
> Sort of… if you indexed the full value of the field (and you’re looking for truly
exact matches) as a string field type you could facet on that field with facet.mincount=2
and the facets returned would be the ones with duplicate values.  You’d have to drill down
on each of the facets returned to find the actual docs.
>
>     Erik
>
>> On Dec 2, 2014, at 10:57 AM, Peter Kirk <pk@alpha-solutions.dk> wrote:
>>
>> Hi
>>
>> Is it possible to formulate a Solr query which finds all documents which have the
same value in a particular field?
>> Note, I don't know what the value is, I just want to find all documents with duplicate
values.
>>
>> For example, I have 5 documents:
>>
>> Doc1: field Name = Peter
>> Doc2: field Name = Jack
>> Doc3: field Name = Peter
>> Doc4: field Name = Paul
>> Doc5: field Name = Jack
>>
>>
>> If I executed the query, it would find documents Doc1 and Doc3 (Peter is the same),
and Doc2 and Doc5 (Jack is the same).
>>
>>
>>
>> Thanks,
>> Peter
>

Mime
View raw message