lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ivan Vasilev <>
Subject Re: How to avoid sharing docStore files?
Date Wed, 12 May 2010 14:15:43 GMT
That`s fine Andrzej :) doing split in just one pass really matters for 
big indexes.
Hope we will use it in our application.

Andrzej Bialecki wrote:
> On 2010-05-12 14:29, Ivan Vasilev wrote:
>> Hi Michael,
>> Thanks for your answer.
>> What we do now:
>> 1. Splitting indexes. We do it not by reading indexes and distributing
>> docs in separate indexes like in MultiPassIndexSplitter. We do it by
>> binary copping segments to different folders and then recreate segment
>> descriptor file for each one (we have created tool for this). The
>> decision of which segment to which new index to go is taken by taking
>> segment sizes and calculating so that to have almost equal indexes. If
>> we have .cfx file this would be an obstacle for current logic of division.
>> I saw the class MultiPassIndexSplitter. It offers splitting index by
>> docs (not by segments). It has a big advantage - index could be split
>> better (to more similar in size parts). It would be done even if index
>> was just optimized and we have only one big segment. But it has also
>> disadvantages. Index is read as many times as the number of new indexes
>> is (it is bad for ~40Gb indexes). Also the original index remains all
>> the time this means if we do the split in one and the same partition we
>> need double disk space.
>> May be we should offer both index split approaches to the user... this
>> depends on higher levels :)
> Hi,
> I wrote the MultiPassIndexSplitter. Yes, multi-pass is problematic with
> large indexes. I'm currently working on a single-pass TrueSplitter :)
> which should be ready within a couple weeks.
> However, even this new tool will make a copy of the original index, so
> you will need twice as much space. But in this case perhaps you could
> put the original index on a network FS, and split it into the target
> partition - the data would be read just once.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message