lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stanislav Jordanov <>
Subject design change suggestion
Date Wed, 06 Dec 2006 16:36:23 GMT
Hi guys,

For the purpose of our product we've devised a bunch of small tool 
classes which handle various utility tasks like:
1. IndexRecoverer - assuming the "segments" file is missing or 
corrupted, this tool rebuilds it based on the *.cfs (and other) files 
found in the index dir (excludes files listed in deletable)

2. IndexSplitter - splits an existing index in 2, 3 or more relatively 
equally sized indices. It simply splits the segments files in distinct 
directories and the uses the IndexRecoverer to rebuild each new Index's 
segment file

3. IndexMerger - in reverse to IndexSplitter merges some indices into 
single index; Uses a modified version of  IndexWriter.addIndexes - it 
does not optimize() in the beginning and in the end. This way the 
resulting index is not a single huge cfs file, which is desirable in 
some cases.

4. IndexOptimizer - Optimizes existing index by merging the 'small' 
segments and compacting the large segments (compacting means 'removing 
the deleted docs within them'); Also converts to compound file format 
any old-style "spilled" segments.

All of the above mentioned tools are classes within the 
org.apache.lucene.index package as they use some package-scope methods 
and properties (+ they feel like belonging there).

Now the design change suggestion - it is about the 'deletable' related code;
according to the source comments  - the delayed deletion of files 
through the 'deletable' is required on Window only as this OS prevents 
files opened for reading to be deleted.
Working on the IndexOptimizer tool I found myself in a situation where I 
needed to 'safe delete' a bunch of obsolete segments while having only 
an (FS)Directory and a segment file name. And the 'safe delete' feature 
is in IndexWriter. Then after reviewing the code I came to the 
conclusion that the 'safe delete' feature logically belongs to the 
(FS)Directory class, not to IndexWriter. I was able to move the 
corresponding code from IndexWriter to (FS)Directory IMO this way is better.
I am attaching (the 2.0.0) modified sources of IndexWriter and 
(FS)Directory for your consideration. (Disclaimer - I can't guarantee my 
changes are bug-free)

Best regards,

View raw message