lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki>
Subject Re: How to avoid sharing docStore files?
Date Wed, 12 May 2010 13:14:41 GMT
On 2010-05-12 14:29, Ivan Vasilev wrote:
> Hi Michael,
> Thanks for your answer.
> What we do now:
> 1. Splitting indexes. We do it not by reading indexes and distributing
> docs in separate indexes like in MultiPassIndexSplitter. We do it by
> binary copping segments to different folders and then recreate segment
> descriptor file for each one (we have created tool for this). The
> decision of which segment to which new index to go is taken by taking
> segment sizes and calculating so that to have almost equal indexes. If
> we have .cfx file this would be an obstacle for current logic of division.
> I saw the class MultiPassIndexSplitter. It offers splitting index by
> docs (not by segments). It has a big advantage - index could be split
> better (to more similar in size parts). It would be done even if index
> was just optimized and we have only one big segment. But it has also
> disadvantages. Index is read as many times as the number of new indexes
> is (it is bad for ~40Gb indexes). Also the original index remains all
> the time this means if we do the split in one and the same partition we
> need double disk space.
> May be we should offer both index split approaches to the user... this
> depends on higher levels :)


I wrote the MultiPassIndexSplitter. Yes, multi-pass is problematic with
large indexes. I'm currently working on a single-pass TrueSplitter :)
which should be ready within a couple weeks.

However, even this new tool will make a copy of the original index, so
you will need twice as much space. But in this case perhaps you could
put the original index on a network FS, and split it into the target
partition - the data would be read just once.

Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration  Contact: info at sigram dot com

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message