lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chuck Williams <>
Subject Re: After kill -9 index was corrupt
Date Fri, 29 Sep 2006 23:37:29 GMT
Hi All,

I found this issue.  There is no problem in Lucene, and I'd like to
leave this thread with that assertion to avoid confusing future archive

The index was actually not corrupt at all.  I use ParallelReader and
ParallelWriter.  A kill -9 can leave the subindexes out of sync.  My
recovery code repairs this on restart by noticing the indexes are
out-of-sync, deleting the document(s) that were added to some
subindex(es) but not the other(s), then optimizing to resync the doc-ids.

The issue is that my bulk updater does not at present support compound
file format and the recovery code forgot to turn that off prior to the
optimize!  Thus a .cfs file was created, which confused the bulk updater
-- it did not see a segment that was inside the cfs.

Sorry for the false alarm and thanks to all who helped with the original


Chuck Williams wrote on 09/11/2006 12:10 PM:
> I do have one module that does custom index operations.  This is my bulk
> updater.  It creates new index files for the segments it modifies and a
> new segments file, then uses the same commit mechanism as merging. 
> I.e., it copes its new segments file into "segments" with the commit
> lock only after all the new index files are closed.  In the problem
> scenario, I don't have any indication that the bulk updater was
> complicit but am of course fully exploring that possibility as well.
> The index was only reopened by the process after the kill -9 of the old
> process was completed, so there were not any threads still working on
> the old process.
> This remains a mystery.  Thanks for you analysis and suggestions.  If
> you have more ideas, please keep them coming!
> Chuck
> robert engels wrote on 09/11/2006 10:06 AM:
>> I am not stating that you did not uncover a problem. I am only stating
>> that it is not due to OS level caching.
>> Maybe your sequence of events triggered a reread of the index, while
>> some thread was still writing. The reread sees the 'unused segments'
>> and deletes them, and then the other thread writes the updated
>> 'segments' file.
>> From what you state, it seems that you are using some custom code for
>> index writing? (Maybe the NewIndexModified stuff)? Possibly there is
>> an issue there. Do you maybe have your own cleanup code that attempts
>> to remove unused segments from the directory? If so, that appears to
>> be the likely culprit to me.
>> On Sep 11, 2006, at 2:56 PM, Chuck Williams wrote:
>>> robert engels wrote on 09/11/2006 07:34 AM:
>>>> A kill -9 should not affect the OS's writing of dirty buffers
>>>> (including directory modifications). If this were the case, massive
>>>> system corruption would almost always occur every time a kill -9 was
>>>> used with any program.
>>>> The only thing a kill -9 affects is user level buffering. The OS
>>>> always maintains a consistent view of directory modifications and or
>>>> file modification that were requesting by programs.
>>>> This entire discussion is pointless.
>>> Thanks everyone for your analysis.  It appears I do not have any
>>> explanation.  In my case, the process was in gc-limbo due to the memory
>>> leak and having butted up against its -Xmx.  The process was kill -9'd
>>> and then restarted.  The OS never crashed.  The server this is on is
>>> healthy; it has been used continually since this happened without being
>>> rebooted and no file system or any other issues.  When the process was
>>> killed, one thread was merging segments as part of flushing the ram
>>> buffer while closing the index, due to the prior kill -15.  When Lucene
>>> restarted, the segments file contained a segment name for which there
>>> were no corresponding index data files.
>>> Chuck
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail:
>>> For additional commands, e-mail:
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message