lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: IndexWriter flush/commit exception
Date Tue, 17 Dec 2013 15:44:09 GMT
On Mon, Dec 16, 2013 at 7:33 AM, Ravikumar Govindarajan
<ravikumar.govindarajan@gmail.com> wrote:
> I am trying to model a transaction-log for lucene, which creates a
> transaction-log per-commit
>
> Things work fine during normal operations, but I cannot fathom the effect
> during
>
> a. IOException during Index-Commit
>
> Will the index be restored to previous commit-point? Can I blindly re-try
> operations from the current transaction log, after some time interval?

Yes: if an IOException is thrown from IndexWriter.commit then the
commit failed and the index still "shows" the previous successful
commit.

> b. IOException during Background-Flush
>
> Will all the RAM buffers including deletes for that DWPT be cleaned up?
> flush() being per-thread and async obviously has problems with my
> transaction-log-per-commit approach, right?
>
> Most of the time, the IOExceptions are temporary and recoverable [Ex:
> Solr's HDFSDirectory etc...]. So, I must definitely retry these operations
> after some time-interval.

IOExceptions during flush are trickier.  Often it will mean all
documents assigned to that segment are lost, but not necessarily (e.g.
if the IOE happened while creating a compound file).

IOExceptions during add/updateDocument are also possible (e.g. we
write stored fields, term vectors per-doc), which can result in losing
all documents in that one segment as well (an aborting exception), but
e.g. an IOE thrown by the analyzer, will just result in that one
document being lost (a non-aborting exception).

Since you cannot know which case it was, it's probably safest to
define a primary key field, and always use IW.updateDocument.  This
way if the document was in fact not lost, and you re-index it, you
just replace it, instead of creating a duplicate.

Mike McCandless

http://blog.mikemccandless.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message