lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Markus Jelsma <>
Subject RE: Re: Dedupe and overwriteDupes setting
Date Tue, 11 May 2010 16:45:30 GMT
Thanks Mark,



I already fixed it in the meantime and quickly went on with the usual stuff, i know, bad me
=). I'll file a Jira report tomorrow and update the wiki on this subject. I'll can also file
another ticket from another current topic on this subject; that's about a proper use-case
for the update handler to return information on which documents where rejected due to dedupe.


I would like to think that updating the wiki with links to those new Jira tickets would be
a good idea for other readers, is it not?



-----Original message-----
From: Mark Miller <>
Sent: Tue 11-05-2010 17:25
Subject: Re: Dedupe and overwriteDupes setting

1. You need to set the sig field to indexed.
2. This should be added to the wiki
3. Want to make a JIRA issue? This is not very friendly behavior (when 
you have the sig field set to indexed=false and overwriteDupes=true it 
should likely complain)

- Mark

On 5/11/10 4:13 AM, Markus Jelsma wrote:
> List,
> I've stumbled upon an issue with the deduplication mechanism. It either
> deletes all documents or does nothing at all and it depends on the
> overwriteDupes setting, resp. true and false.
> I use a slightly modified configuration:
>    <updateRequestProcessorChain name="dedupe">
>      <processor
> class="org.apache.solr.update.processor.SignatureUpdateProcessorFactory">
>        <bool name="enabled">true</bool>
>        <str name="signatureField">sig</str>
>        <bool name="overwriteDupes">true</bool>
>        <str name="fields">content</str>
>        <str
> name="signatureClass">org.apache.solr.update.processor.Lookup3Signature</str>
>      </processor>
>      <processor class="solr.LogUpdateProcessorFactory" />
>      <processor class="solr.RunUpdateProcessorFactory" />
>    </updateRequestProcessorChain>
>          <field name="sig" type="string" stored="true" indexed="false"
> multiValued="true" />
> After importing new documents i (only with overwriteDupes=false) can clearly
> see the correct signatures. Most documents have a distinct signature and some
> share the same because the content field's value is identical for those
> documents.
> Anyway, why does it delete all my documents? Any clues? The wiki is not very
> helpful on this subject.
> Cheers.
> Markus Jelsma - Technisch Architect - Buyways BV
> 050-8536620 / 06-50258350

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message