lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Derek Poh <d...@globalsources.com>
Subject Re: index duplicate records from data source into 1 document
Date Thu, 19 Mar 2015 08:09:24 GMT
Hi Erick

Am I right to saywe need todo the combine of duplicate records into 1 
before feeding it to Solr to index?

I am coming from Endecawhich support the combine of duplicate records 
into 1 recordduring indexing. Was wondering if Solr support this.

-Derek

On 3/18/2015 11:21 PM, Erick Erickson wrote:
> I'd use SolrJ, pull the docs by productId order and combine records
> with the same product ID into a single doc.
>
> Here's a starter set for indexing form a DB with SolrJ. It has Tika
> processing in it as well, but you can pull that out pretty easily.
>
> https://lucidworks.com/blog/indexing-with-solrj/
>
> Best,
> Erick
>
> On Wed, Mar 18, 2015 at 2:52 AM, Derek Poh <dpoh@globalsources.com> wrote:
>> Hi
>>
>> If I have duplicaterecords in my source data (DB or delimited files). For
>> simplicity sake they are of the following nature
>>
>> Product Id    Business Type
>> -----------------------------------
>> 12345         Exporter
>> 12345     Agent
>> 12366     Manufacturer
>> 12377         Exporter
>> 12377 Distributor
>>
>> There are other fields with multiple values as well.
>>
>> How do I index theduplicate records into 1 document. Eg. Product Id 12345
>> will be 1 document,12366 as 1 document and 12377 as 1 document.
>>
>> -Derek
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message