lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xiaozheng Ma" <>
Subject RE: deleting documents from index
Date Thu, 01 Sep 2005 14:18:04 GMT
Indexing on one indexing file in a multithreaded env needs to be
serialized --you need to synchronize the call to
indexwriter.addDocument(). Otherwise Lucene will throw exceptions. After
all, Lucene uses file-based locking to ensure that only one thread can
modify the same index at the same time.  

In your situation, I believe, if you have multiple threads working on
same indexing file to index new docs, you still have same problem. But I
guess you probably only have one thread doing the indexing and another
one deletes the index by querying ids. 

One solution to multiple threaded indexing on the same index file is to
split the indexing process into independent pieces(and of course each
uses different index file): each thread works on indexing different docs
then at some point merges the segments into one index file if you will.
In the mean time, the deletion can delete the docs on the prior merged
file when the mention merging is not happening (it is not locked).

The merger code is like this:

        Directory[] inds = new Directory[fileList.length]; //each file
dir contains the complete and independent index segment
        for(int i=0; i<fileList.length;i++) { 
            String path = indexPath+"/"+fileList[i];
            inds[i] = FSDirectory.getDirectory(path, false); 
        indexPath = indexPath+"/merge";  //mergy to $(indexPath)/mergy
        if(!(new File(indexPath).exists())){
            boolean success = (new File(indexPath)).mkdirs();
            if (!success) {
                System.out.println("cannot make dir: "+indexPath);
        IndexWriter writer = new IndexWriter(indexPath, new
StandardAnalyzer(), true);
        writer.addIndexes(inds); //merge indexes

        for(int i=0; i<fileList.length;i++) { 

Hope this helps!


-----Original Message-----
Sent: Thursday, September 01, 2005 1:28 AM
Subject: deleting documents from index


In order to delete the documents in the index more efficiently during
the incremental indexing process, I implement the batch deleting process
on the application level. First  I  get the internal document ids based
on the query, then only delete these documents based on the internal ids
when the indexwriter is closed or the index is optimized since the
internal document ids change only whent the index optimized. Could this
be an issue ?
The reason for doing that is that deleting documents from the index in
one thread fails sometimes when another thread is adding new documents
in the same index.


To unsubscribe, e-mail:
For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message