lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "thakkar.aayush" <>
Subject Customzing Solr Dedupe
Date Wed, 01 Apr 2015 10:35:47 GMT
I'm facing a challenges using de-dupliation of Solr documents.

De-duplicate is done using TextProfileSignature with following parameters: 
<str name="fields">field1, field2, field3</str> 
<str name="quantRate">0.5</str>
<str name="minTokenLen">3</str>

Here Field3 is normal text with few lines of data.
Field1 and Field2 can contain upto 5 or 6 words of data. 

I want to de-duplicate when data in field1 and field2 are exactly the same
and 90% of the lines in field3 is matched to that in another document. 

Is there anyway to achieve this?

View this message in context:
Sent from the Solr - User mailing list archive at

View raw message