lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Derek Lewis <de...@lewisd.com>
Subject Unexpected returning false from IndexWriter.tryDeleteDocument
Date Fri, 20 Dec 2013 17:12:54 GMT
Hello,

I have a problem where IndexWriter.tryDeleteDocument is returning false
unexpectedly.  Unfortunately, it's in production, on indexes that have
since been merged and shunted around all over, and I've been unable to
create a scenario that duplicates the problem in any development
environments.  It also means I haven't been able to find out exact details
about the scenario, so some of this is extrapolation.

The basic scenario is, I think,  this:
There is a Lucene index with millions of documents, and a bunch of segments.
Each of the documents has an associated "serialId" stored.  There are many
many duplicates, due to a transient error that occurred.
Our system attempts to perform a process whereby it merges the index
segments, and deletes the documents with duplicate serialIds, so that at
the end of the process, we have only one segment, and for each serialId
there is only one document.

We have an IndexWriter we created with:
writer = new IndexWriter(
                    FSDirectory.open(indexdir),
                    config);

We create a DirectoryReader:
final DirectoryReader nearRealtimeReader = DirectoryReader.open(writer,
false);

which we use to iterate over the documents with:
for (int docId = 0; docId < nearRealtimeReader.maxDoc(); ++docId) {

For any document who's serialId indicates it's a duplicate (ie. we've
already seen that serialId), we delete it:
final boolean deletionSuccessful =
writer.tryDeleteDocument(nearRealtimeReader, docId);

This works the vast majority of the time, however, in this case I haven't
been able to reproduce, it returns false, which we check, and throw an
exception.

What I found particularly interesting is that when our system re-schedules
this process and tries again, it eventually succeeds, despite nothing else
in our system writing to this index in the meantime. (Before indexes are
shunted off to this merging process, they're "closed" to the rest of the
system)  This seems to hint to me that maybe something is merging the
segments of this index, even though we throw and exception before we get to
the part of our code that calls:
writer.forceMerge(1, true);
writer.commit();

Any ideas as to why this might be happening?

We're using Lucene 4.4.0, on Java 7 64-bit, on Solaris.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message