lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Heisey <apa...@elyograg.org>
Subject Re: index duplicate records from data source into 1 document
Date Thu, 19 Mar 2015 14:44:49 GMT
On 3/19/2015 2:09 AM, Derek Poh wrote:
> Am I right to saywe need todo the combine of duplicate records into 1
> before feeding it to Solr to index?
>
> I am coming from Endecawhich support the combine of duplicate records
> into 1 recordduring indexing. Was wondering if Solr support this.

If you index multiple documents with the same uniqueId field value, Solr
will delete the previous document and index the new one.  The data in
the previous document is never seen.

You could in theory write a custom UpdateRequestProcessor that looks for
the previous document and merges it in whatever way you desire, so the
combined information is what will be indexed, and configure Solr to use
that update processor ...but this capability is not available out of the
box.

An update processor that does this should probably be included with
Solr, but it would either need to be highly configurable, or everyone
would need to agree on exactly what rules should be followed when
combining duplicate records.

Thanks,
Shawn


Mime
View raw message