lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: Lock-less commits
Date Fri, 18 Aug 2006 15:57:24 GMT

>> The basic idea is to change all commits (from SegmentReader or
>> IndexWriter) so that we never write to an existing file that a reader
>> could be reading from.  Instead, always write to a new file name using
>> sequentially numbered files.  For example, for "segments", on every
>> commit, write to a the sequence: segments.1, segments.2, segments.3,
>> etc.  Likewise for the *.del and *.fN (norms) files that
>> SegmentReaders write to.
> Interesting idea...
> How do you get around races between opening and deleting?
> I assume for the writer, you would
>  1) write new segments
>  2) write new 'segments.3'
>  3) delete unused segments (those referenced by 'segments.2')
> But what happens when a reader comes along at point 1.5, say, opens
> the latest 'segments.2' file, and then tries to open some of the
> segments files at 3.5?
> I guess the reader could retry... checking for a new segments file.
> This could happen more than once (hopefully it wouldn't lead to
> starvation... that would be unlikely).

Yes, exactly.

And specifically, the reader only retries if, on hitting a FileNotFound 
exception, it then checks & sees that a newer segments file is 
available.  This way if there is a "true" FileNotFound exception due to 
some sort of index corruption or something, we will [correctly] throw it.

It could in theory lead to starvation but this should be rare in 
practice unless you have an IndexWriter that's constantly committing.

Also note that this should be no worse than what we have today, where 
you would also likely hit starvation and get a "Lock obtain timed out" 
thrown (eg see

In my stress test (shared index with writer accessing it over NFS and 3 
reader threads doing "open indexsearcher; search" over and over, via 
Samba share) the IndexSearchers do retry but so far never more than 
once.  Of course this will depend heavily on details of the use case ...

>> We can also get rid of the "deletable" file (and associated errors
>> renaming -> deletable) because we can compute what's
>> deletable according to "what's not referenced by current segments
>> file."
> If the segments file is written last, how does an asynchronous deleter
> tell what will be part of a future index?  I guess it's doable if all
> file types have sequence numbers...

Well, in my current implementation I don't have a truly asynchronous 
deleter.  If I did have that then you're right I'd need to not delete 
the "new and in progress" files.  We could consider something like that 
in the future ...

Instead, I still do all deletes [synchronously] in the same places as 
the current code, with the write lock held.  For example, during a 
commit, we delete old segments immediately after writing the new 
segments file, and then again after creating a compound file (if index 
is using compound files).  Likewise when a SegmentReader commits new 

Also one neat possibility this could lead to in the future is to 
explicitly keep "virtual snapshots" at points in time, but within a 
single index (vs eg the hard-link snapshots that Solr does).

For example if you want to index a bunch of docs, but not make them 
visible yet for searching, with the current code, you have to make sure 
never to restart an IndexSearcher.  But if your app server goes down 
(say), then all IndexSearchers will come back up and make your indexed 
docs visible.

But with this new approach (plus some additional code that I'm not 
planning on doing for starters), it would be possible for an 
IndexSearcher to explicitly say "I'd like to re-open the snapshot of the 
index as of 3 days ago", for example.  This would require more smarts in 
the reclaiming of old files ... but at least this could be a first step 
towards that.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message