lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Markus Jelsma <markus.jel...@buyways.nl>
Subject RE: Re: Dedupe and overwriteDupes setting
Date Tue, 11 May 2010 16:45:30 GMT
Thanks Mark,

 

 

I already fixed it in the meantime and quickly went on with the usual stuff, i know, bad me
=). I'll file a Jira report tomorrow and update the wiki on this subject. I'll can also file
another ticket from another current topic on this subject; that's about a proper use-case
for the update handler to return information on which documents where rejected due to dedupe.

 

I would like to think that updating the wiki with links to those new Jira tickets would be
a good idea for other readers, is it not?

 

 

Cheers,
 
-----Original message-----
From: Mark Miller <markrmiller@gmail.com>
Sent: Tue 11-05-2010 17:25
To: solr-user@lucene.apache.org; 
Subject: Re: Dedupe and overwriteDupes setting

1. You need to set the sig field to indexed.
2. This should be added to the wiki
3. Want to make a JIRA issue? This is not very friendly behavior (when 
you have the sig field set to indexed=false and overwriteDupes=true it 
should likely complain)



-- 
- Mark

http://www.lucidimagination.com


On 5/11/10 4:13 AM, Markus Jelsma wrote:
> List,
>
>
> I've stumbled upon an issue with the deduplication mechanism. It either
> deletes all documents or does nothing at all and it depends on the
> overwriteDupes setting, resp. true and false.
>
> I use a slightly modified configuration:
>
>    <updateRequestProcessorChain name="dedupe">
>      <processor
> class="org.apache.solr.update.processor.SignatureUpdateProcessorFactory">
>        <bool name="enabled">true</bool>
>        <str name="signatureField">sig</str>
>        <bool name="overwriteDupes">true</bool>
>        <str name="fields">content</str>
>        <str
> name="signatureClass">org.apache.solr.update.processor.Lookup3Signature</str>
>      </processor>
>      <processor class="solr.LogUpdateProcessorFactory" />
>      <processor class="solr.RunUpdateProcessorFactory" />
>    </updateRequestProcessorChain>
>
>
>          <field name="sig" type="string" stored="true" indexed="false"
> multiValued="true" />
>
> After importing new documents i (only with overwriteDupes=false) can clearly
> see the correct signatures. Most documents have a distinct signature and some
> share the same because the content field's value is identical for those
> documents.
>
>
> Anyway, why does it delete all my documents? Any clues? The wiki is not very
> helpful on this subject.
>
>
> Cheers.
>
>
> Markus Jelsma - Technisch Architect - Buyways BV
> http://www.linkedin.com/in/markus17
> 050-8536620 / 06-50258350
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message