lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Derek Lewis <de...@lewisd.com>
Subject Re: Unexpected returning false from IndexWriter.tryDeleteDocument
Date Fri, 20 Dec 2013 18:06:38 GMT
Hi Mike,

Thanks for the response.  I realize that merging could cause segments to be
deleted, resulting in tryDeleteDocument returning false.  However, I've
been unable to figure out why the scenario I've described would cause
segments to be merged.  I've tried duplicating it by writing indexes with
many segments and deleting all the documents in them, but I haven't had any
luck.

Can you suggest any ways the scenario I've outlined would cause merges?

Cheers,
Derek


On Fri, Dec 20, 2013 at 9:50 AM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> tryDeleteDocument will return false if the IndexReader is "stale",
> i.e. the segment that contains the docID you are trying to delete has
> been merged by IndexWriter.
>
> In this case you need to fallback to deleting by Term/Query.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Fri, Dec 20, 2013 at 12:12 PM, Derek Lewis <derek@lewisd.com> wrote:
> > Hello,
> >
> > I have a problem where IndexWriter.tryDeleteDocument is returning false
> > unexpectedly.  Unfortunately, it's in production, on indexes that have
> > since been merged and shunted around all over, and I've been unable to
> > create a scenario that duplicates the problem in any development
> > environments.  It also means I haven't been able to find out exact
> details
> > about the scenario, so some of this is extrapolation.
> >
> > The basic scenario is, I think,  this:
> > There is a Lucene index with millions of documents, and a bunch of
> segments.
> > Each of the documents has an associated "serialId" stored.  There are
> many
> > many duplicates, due to a transient error that occurred.
> > Our system attempts to perform a process whereby it merges the index
> > segments, and deletes the documents with duplicate serialIds, so that at
> > the end of the process, we have only one segment, and for each serialId
> > there is only one document.
> >
> > We have an IndexWriter we created with:
> > writer = new IndexWriter(
> >                     FSDirectory.open(indexdir),
> >                     config);
> >
> > We create a DirectoryReader:
> > final DirectoryReader nearRealtimeReader = DirectoryReader.open(writer,
> > false);
> >
> > which we use to iterate over the documents with:
> > for (int docId = 0; docId < nearRealtimeReader.maxDoc(); ++docId) {
> >
> > For any document who's serialId indicates it's a duplicate (ie. we've
> > already seen that serialId), we delete it:
> > final boolean deletionSuccessful =
> > writer.tryDeleteDocument(nearRealtimeReader, docId);
> >
> > This works the vast majority of the time, however, in this case I haven't
> > been able to reproduce, it returns false, which we check, and throw an
> > exception.
> >
> > What I found particularly interesting is that when our system
> re-schedules
> > this process and tries again, it eventually succeeds, despite nothing
> else
> > in our system writing to this index in the meantime. (Before indexes are
> > shunted off to this merging process, they're "closed" to the rest of the
> > system)  This seems to hint to me that maybe something is merging the
> > segments of this index, even though we throw and exception before we get
> to
> > the part of our code that calls:
> > writer.forceMerge(1, true);
> > writer.commit();
> >
> > Any ideas as to why this might be happening?
> >
> > We're using Lucene 4.4.0, on Java 7 64-bit, on Solaris.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message