lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adar, Eytan" <>
Subject RE: indexing race condition?
Date Fri, 07 Dec 2001 23:02:56 GMT
I considered batching. 

If it was a simple ID matter I could keep a queue of add/delete events and
clean up before commiting.  For example:

Index contains (d1,d2,d3)
Queue contains (in time order) = 1. delete(d1), 2. add(d1), 3. delete(d1),4.

I would crawl back through the queue and start cleaning it up.  In this
example, I would run backwards and notice that step 3 removes step 2, and
repeats step 1 (so i would remove steps 1 and 2).  Then I would just go
forward through the queue doing each step that is left 3 (which would remove
the document from the index), and 4 (which would add a new copy).

but unfortunately that's not going to work for me since I don't always
delete by id (sometimes it's by term). What can I say? I have a wierd app.  

So it gets much hard when I need to do:

add(d) -> delete(where term="foo") 

because I can't just quickly compare some id.  I guess I'll have to think
about this more.



-----Original Message-----
From: Doug Cutting []
Sent: Friday, December 07, 2001 2:27 PM
To: 'Lucene Developers List'
Subject: RE: indexing race condition?

> From: Adar, Eytan []
> I could just queue up all the delete requests and excute them 
> (once in a
> while) after I close the index.  The problem is that some of my delete
> operations are actually part of a "replace" procedure (delete 
> then add).
> Waiting on the deletes will mean that I wipe the document 
> totally from the
> index (not what I wanted).  

I think the best way handle this is only re-create the IndexReader you use
for searching when the index is in a state that you want to search.  The
index as represented by an IndexSearcher instance does not change (unless
you delete documents from it, which you shouldn't).

So you keep one IndexReader for searching.  Batch additions and deletions.
Then re-open the index used for searching when the batched additions and
deletions are complete and the index is stable.

It would be nice if delete could be implemented directly on an IndexWriter,
but internally it would have to open an IndexReader and do the same work it
does presently, so it is actually more efficient to expose this and
encourage batching.


To unsubscribe, e-mail:   <>
For additional commands, e-mail: <>

To unsubscribe, e-mail:   <>
For additional commands, e-mail: <>

View raw message