lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: Lock-less commits
Date Fri, 18 Aug 2006 18:41:43 GMT

>> It could in theory lead to starvation but this should be rare in
>> practice unless you have an IndexWriter that's constantly committing.
> An index with a small mergeFactor (say 2) and a small maxBufferedDocs
> (default 10), would have segments deleted every
> mergeFactor*maxBufferedDocs when rapidly adding documents.  It might
> help to start opening segments with the *last* segment, where segment
> deletions are most likely to happen.

That is true.  I like the idea of opening last segments first -- I'll do

> Also, when loading a .del file, how would one tell if it didn't exist
> or if it was just deleted?
> I guess one would always need to write a .del file even if no docs
> were deleted.  Or, one could just order the deletes (delete optional
> files in a segment last).

Right, in order to handle this, I've modified the segments file to
also contain the current "generation" (the .N suffix) of each
segment's .del & norms suffixes.  This way when SegmentReader reads
the segment, it knows exactly which del/norms files it's supposed to
find.  For "doUndeleteAll()" I write a zero-length .del.N+1 file.
SegmentReader is already writing a new segments file when it commits
(in today's code).

> One would also have to worry about partially deleted segments on
> Windows... while removing a segment, some of the files might fail to
> delete (due to still being open) and some might succeed.

Yes, I think this case is handled correctly.  Once all searchers using
those old segments are closed, then the next commit that runs will
remove those files (just like it does today).

Not having to read/write the deletable file should make things more
robust (there was a thread recently on users list about hitting an
exception because couldn't be deleted on Windows).

> This idea is worth kicking around more for the future (maybe for when
> the index format changes again), but it's probably too much change for
> right now (Lucene 2.0.x), right?

Yes I don't think this should go in for a 2.0.x point release.  Maybe
for a 2.1.x?  Or I guess whenever we next have a major enough release
to allow changing of the index format.

I do think the benefits are sizable, though, so we should not wait too
too long :) The number of poor people who post to the users list with
errant Access Denied, FileNotFound, lock obtain timed out, etc.,
exceptions is quite large.  There was just one today that I'm going to
go try to respond to next.  Plus the prospect of working just fine on
remote filesystems is great!

OK I will keep working through this & running stress tests on it to
see if I can uncover any issues...


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message