lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <>
Subject Re: design change suggestion
Date Wed, 06 Dec 2006 19:21:04 GMT
Michael McCandless wrote:
>> 1. IndexRecoverer - assuming the "segments" file is missing or 
>> corrupted, this tool rebuilds it based on the *.cfs (and other) files 
>> found in the index dir (excludes files listed in deletable)
> Excellent.  I know that various cases of "recovering an index" have
> come up on the lists over time.  It would be great to have a single
> tool that can try to correct the different problems that users hit, eg
> removing a single unusable segments file, regenerating the segments
> file, etc.
>> 2. IndexSplitter - splits an existing index in 2, 3 or more relatively 
>> equally sized indices. It simply splits the segments files in distinct 
>> directories and the uses the IndexRecoverer to rebuild each new 
>> Index's segment file
> Seems like a good tool for contrib?

If these rely only on the public index-format spec and public index 
apis, then they could go in contrib, which would be easiest, since 
expectations about back-compatibility and long-term support are lower 
for contrib.

But if they rely on index package internals then they should be 
maintained with the core.  Then the question becomes: are these features 
that we can maintain long-term?  The index implementation will likely 
evolve, and the existing public API should be supported through this 
evolution: APIs must be more durable than implementations.  So, are 
these features things that can be supported through likely 
implementation changes?

I suspect they are.  We've talked about making the postings format more 
flexible, but I have not heard anyone talk about a need to substantially 
alter the segments & merging model.  Are we comfortable adding public 
APIs that depend on that model?

An index splitter is useful with parallel and/or distributed search. 
Splitting on segment boundaries is fairly limited, but perhaps, with 
clever use of IndexWriter.setMaxMergeDocs(), it is sufficient.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message