lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: index duplicate records from data source into 1 document
Date Wed, 18 Mar 2015 15:21:34 GMT
I'd use SolrJ, pull the docs by productId order and combine records
with the same product ID into a single doc.

Here's a starter set for indexing form a DB with SolrJ. It has Tika
processing in it as well, but you can pull that out pretty easily.

https://lucidworks.com/blog/indexing-with-solrj/

Best,
Erick

On Wed, Mar 18, 2015 at 2:52 AM, Derek Poh <dpoh@globalsources.com> wrote:
> Hi
>
> If I have duplicaterecords in my source data (DB or delimited files). For
> simplicity sake they are of the following nature
>
> Product Id    Business Type
> -----------------------------------
> 12345         Exporter
> 12345     Agent
> 12366     Manufacturer
> 12377         Exporter
> 12377 Distributor
>
> There are other fields with multiple values as well.
>
> How do I index theduplicate records into 1 document. Eg. Product Id 12345
> will be 1 document,12366 as 1 document and 12377 as 1 document.
>
> -Derek

Mime
View raw message