lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kevin Oliver" <>
Subject Avoiding segment merges during indexing
Date Thu, 11 Aug 2005 21:13:08 GMT
This is a proposal that is in need of some insights.

In an effort to speed up adding documents to an existing index, we are
pursuing using IndexWriter.addIndexes(Directory[]). In theory this
should work great -- you index your new documents into a new Directory,
then add them into to your existing directory, saving you the time spent
merging segments that would be caused by the normal
IndexWriter.addDocument(Document) calls during indexing. 

However, addIndexes() has the property that it calls optimize() both
before and after adding the new directories. This wipes out the
performance boost, and then some. 

So I found a way to work around this, but I don't like what I've had to
do and I was wondering if anybody has any ideas on what could be done to
make this more pleasant.

It appears that by getting the new segment files into the existing
directory, with the correct segment names, it will work without all of
the optimize calls. Unfortunately, getting the segment names right and
getting the files into the right location is a big ugly hack and is
quite fragile.

Is there a better way? I think maybe some explanation into why the 2
optimizes are there would help my understanding. Is there a clean way of
doing what I'm proposing? Is there some hidden catch I'm missing and
I've been going down the wrong path?

It seems to me this would be a great benefit to anyone who does indexing
on existing indexes and wants it to be fast. 

Kevin Oliver

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message