lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley" <>
Subject Re: Lock-less commits
Date Fri, 18 Aug 2006 14:01:41 GMT
> The basic idea is to change all commits (from SegmentReader or
> IndexWriter) so that we never write to an existing file that a reader
> could be reading from.  Instead, always write to a new file name using
> sequentially numbered files.  For example, for "segments", on every
> commit, write to a the sequence: segments.1, segments.2, segments.3,
> etc.  Likewise for the *.del and *.fN (norms) files that
> SegmentReaders write to.

Interesting idea...
How do you get around races between opening and deleting?

I assume for the writer, you would
  1) write new segments
  2) write new 'segments.3'
  3) delete unused segments (those referenced by 'segments.2')

But what happens when a reader comes along at point 1.5, say, opens
the latest 'segments.2' file, and then tries to open some of the
segments files at 3.5?
I guess the reader could retry... checking for a new segments file.
This could happen more than once (hopefully it wouldn't lead to
starvation... that would be unlikely).

> We can also get rid of the "deletable" file (and associated errors
> renaming -> deletable) because we can compute what's
> deletable according to "what's not referenced by current segments
> file."

If the segments file is written last, how does an asynchronous deleter
tell what will be part of a future index?  I guess it's doable if all
file types have sequence numbers...

-Yonik Solr, the open-source Lucene search server

On 8/18/06, Michael McCandless <> wrote:
> I think it's possible to modify Lucene's commit process so that it
> does not require any commit locking at all.
> This would be a big win because it would prevent all the various messy
> errors (FileNotFound exceptions on instantiating an IndexReader,
> Access Denied errors on renaming -> X, Lock obtain timed out
> from leftover lock files, etc.) that Lucene users keep coming across.
> Also, indices against remote (NFS, Samba) filesystems, where current
> locking has known issues that users seem to hit fairly often, would
> then be fine.
> I'd like to get feedback on this idea (am I missing something?) and if
> there are no objections I can submit a full patch.
> I have an initial implementation that passes all unit tests.  It also
> runs fine with a writer/searcher stress test: the writer adding docs
> to an index stored on NFS, and a multi-threaded reader on a separate
> (Windows XP, mounted over Samba) machine continuously re-instantiating
> an IndexSearcher and doing a search against the same index.

> Disk usage should be the same, even temporarily when merging, because
> we still remove the old segments after merging.

> This means IndexReader, on opening an index, finds the most recent
> segments file and loads it.  If, when loading the segments, it hits a
> FileNotFound exception, and a newer segments file has appeared, it
> re-tries against the new one.
> This does entail small changes to the index file format.
> Specifically, file names are different (they have new .N suffixes),
> and, the contents of the segments file is expanded to contain details
> about which del/norm files are current for each segment.
> Note that the write lock is still needed to catch people accidentally
> creating two writers on one index.  But since this lock file isn't
> obtained/released as frequently as the current commit lock, I would
> expect fewer issues from it.
> This change should be fully backwards compatible, meaning the new code
> would read the old index format and I believe existing APIs should not
> change.  But, if there are applications (maybe Solr?) that peek inside
> the index files expecting (for example) a file named "segments" to be
> there then such cases would need to be fixed.
> Mike
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message