lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From P Williams <williams.tricia.l...@gmail.com>
Subject Re: avoid overwrite in DataImportHandler
Date Thu, 08 Dec 2011 19:11:09 GMT
Ah.  Thanks Erick.

I see now that my question is different from sabman's.

Is there a way to use the DataImportHandler's "full-import" command so that
it does not delete the existing material before it begins?

Thanks,
Tricia

On Thu, Dec 8, 2011 at 6:35 AM, Erick Erickson <erickerickson@gmail.com>wrote:

> This is all controlled by Solr via the <uniqueKey> field in your schema.
> Just
> remove that entry.
>
> But then it's all up to you to handle the fact that there will be multiple
> documents with the same ID all returned as a result of querying. And
> it won't matter what program adds data, *nothing* will be overwritten,
> DIH has no part in that decision.
>
> Deduplication is about defining some fields in your record and avoiding
> adding another document if the contents are "close", where close is a
> slippery concept. I don't think it's related to your problem at all.
>
> Best
> Erick
>
> On Wed, Dec 7, 2011 at 3:27 PM, P Williams
> <williams.tricia.list@gmail.com> wrote:
> > Hi,
> >
> > I've wondered the same thing myself.  I feel like the "clean" parameter
> has
> > something to do with it but it doesn't work as I'd expect either.  Thanks
> > in advance to anyone who can answer this question.
> >
> > *clean* : (default 'true'). Tells whether to clean up the index before
> the
> > indexing is started.
> >
> > Tricia
> >
> > On Wed, Dec 7, 2011 at 12:49 PM, sabman <saby83@gmail.com> wrote:
> >
> >> I have a unique ID defined for the documents I am indexing. I want to
> avoid
> >> overwriting the documents that have already been indexed. I am using
> >> XPathEntityProcessor and TikaEntityProcessor to process the documents.
> >>
> >> The DataImportHandler does not seem to have the option to set
> >> overwrite=false. I have read some other forums to use deduplication
> instead
> >> but I don't see how it is related to my problem.
> >>
> >> Any help on this (or explanation on how deduplication would apply to my
> >> probelm ) would be great. Thanks!
> >>
> >> --
> >> View this message in context:
> >>
> http://lucene.472066.n3.nabble.com/avoid-overwrite-in-DataImportHandler-tp3568435p3568435.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message