lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: Lock-less commits
Date Fri, 18 Aug 2006 20:52:54 GMT

> i don't think these changes are going to work. With multiple writers and 
> or readers doing deletes, without serializing the writes you will  have 
> inconsistencies - and the del files will need to be unioned.
> That is:
> station A opens the index
> station B opens the index
> station A deletes some documents creating segment.del1
> station B deletes some documents creating segment.del2
> when station C opens the index (or when the segment is merged) del1 and 
> del2 need to be merged.
> The locking enforces that writers are serialized - you cannot remove 
> this restriction unless you merge the writes when reading.

Sorry, I should be very clear: I am not proposing we remove the write
lock.  The write lock must definitely remain (for the reasons /
examples you list above).  Only one writer can be open at a time
against the index.

The commit lock, which is used to ensure that when an IndexReader
opens the index, no writer is changing it at that moment (and v/v), is
I think the more problematic of the two.

The reason is, the write lock is really a safety net: it's up to you
to use Lucene in such a way that you never try to create two writers
at the same time.  You can use IndexModifier.  Or you can do your own
switching between IndexReader/IndexWriter.  Or you can use the patch
in LUCENE-565 so that IndexWriter is able to delete documents.  But in
all these cases, the write lock is really just a safety net: it
catches you if you accidentally violate this constraint and then you
go and fix your code accordingly.  You would typically catch this in
development / testing because it's a coding / design error.

The commit lock is more troublesome because it really serves an active
purpose in typical Lucene apps when there's otherwise no app level
logic to synchronize opening an IndexReader vs when a writer is
committing.  The writers can commit whenever they want to (well
IndexWriter at least).  And an IndexReader initialization is often
unpredictable (whenever you restart you App server instance, etc.).
So the timing of these events does require active serialization as
things stands now.

Because of this, an index stored on a remote store (eg, NFS, Samba),
where our current locking implementation is known [silently] not to
work, will eventually cause an errant FileNotFound or an Access Denied
exception.  And this is insidious because it may work fine during
initial development and testing only to strike after some time in
production.  This is why I'd like to change commits to not require
locking at all (by never re-using the same file name), while keeping
the write locking.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message