lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: index duplicate records from data source into 1 document
Date Thu, 19 Mar 2015 20:15:19 GMT
bq: Am I right to saywe need todo the combine of duplicate records
into 1 before feeding it to Solr to index?

That's what I'd do. As Shawn says, if you simply fire them both at
Solr the more recent one will replace the older one.

Best,
Erick

On Thu, Mar 19, 2015 at 7:44 AM, Shawn Heisey <apache@elyograg.org> wrote:
> On 3/19/2015 2:09 AM, Derek Poh wrote:
>> Am I right to saywe need todo the combine of duplicate records into 1
>> before feeding it to Solr to index?
>>
>> I am coming from Endecawhich support the combine of duplicate records
>> into 1 recordduring indexing. Was wondering if Solr support this.
>
> If you index multiple documents with the same uniqueId field value, Solr
> will delete the previous document and index the new one.  The data in
> the previous document is never seen.
>
> You could in theory write a custom UpdateRequestProcessor that looks for
> the previous document and merges it in whatever way you desire, so the
> combined information is what will be indexed, and configure Solr to use
> that update processor ...but this capability is not available out of the
> box.
>
> An update processor that does this should probably be included with
> Solr, but it would either need to be highly configurable, or everyone
> would need to agree on exactly what rules should be followed when
> combining duplicate records.
>
> Thanks,
> Shawn
>

Mime
View raw message