lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Derek Poh <d...@globalsources.com>
Subject Re: index duplicate records from data source into 1 document
Date Fri, 20 Mar 2015 06:31:59 GMT
Oh that is how Solr works...

On 3/19/2015 10:44 PM, Shawn Heisey wrote:
> On 3/19/2015 2:09 AM, Derek Poh wrote:
>> Am I right to saywe need todo the combine of duplicate records into 1
>> before feeding it to Solr to index?
>>
>> I am coming from Endecawhich support the combine of duplicate records
>> into 1 recordduring indexing. Was wondering if Solr support this.
> If you index multiple documents with the same uniqueId field value, Solr
> will delete the previous document and index the new one.  The data in
> the previous document is never seen.
>
> You could in theory write a custom UpdateRequestProcessor that looks for
> the previous document and merges it in whatever way you desire, so the
> combined information is what will be indexed, and configure Solr to use
> that update processor ...but this capability is not available out of the
> box.
>
> An update processor that does this should probably be included with
> Solr, but it would either need to be highly configurable, or everyone
> would need to agree on exactly what rules should be followed when
> combining duplicate records.
>
> Thanks,
> Shawn
>
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message