lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ahmet Arslan <>
Subject Re: How to run the solr dedup for the document which match 80% or match almost.
Date Tue, 27 Dec 2011 11:46:20 GMT
> I am doing dedup for my solr instance which works on the
> content and the url
> fields.My question is if I want to eliminate the records
> which are 80%
> matching or 90% matching in the content field then how I
> should proceed for
> that?
> Already I have changed my solrconfig.xml and have changed
> the part of file
> which is required for the dedup(update Request Processor
> chain) and that
> part is working fine.

You can use TextProfileSignature, which is a Fuzzy hashing implementation, instead of Lookup3Signature.

View raw message