lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "thakkar.aayush" <thakkar.aay...@gmail.com>
Subject Customzing Solr Dedupe
Date Wed, 01 Apr 2015 10:35:47 GMT
I'm facing a challenges using de-dupliation of Solr documents.

De-duplicate is done using TextProfileSignature with following parameters: 
<str name="fields">field1, field2, field3</str> 
<str name="quantRate">0.5</str>
<str name="minTokenLen">3</str>

Here Field3 is normal text with few lines of data.
Field1 and Field2 can contain upto 5 or 6 words of data. 

I want to de-duplicate when data in field1 and field2 are exactly the same
and 90% of the lines in field3 is matched to that in another document. 

Is there anyway to achieve this?



--
View this message in context: http://lucene.472066.n3.nabble.com/Customzing-Solr-Dedupe-tp4196879.html
Sent from the Solr - User mailing list archive at Nabble.com.

Mime
View raw message