lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: Deferring merging of index segments
Date Fri, 01 Jun 2012 21:25:13 GMT
64% greater index size when you merge at the end is odd.

Can you post the ls -l output of the final index in both cases?

Are you only adding (not deleting) docs?

This is perfectly valid to do... but I'm surprised you see the two
approaches taking about the same time.  I would expect letting Lucene
merge as it goes would be net/net faster since merging can soak up
unused IO bandwidth concurrent to indexing....

Mike McCandless

On Tue, May 29, 2012 at 9:42 PM, Vitaly Funstein <> wrote:
> Hello,
> I am trying to optimize the process of "warming up" an index prior to
> using the search subsystem, i.e. it is guaranteed that no other writes
> or searches can take place in parallel with with the warmup. To that
> end, I have been toying with the idea of turning off segment merging
> altogether until after all the data has been written and committed. I
> am currently using Lucene 3.0.3 and migration to a later version is
> not an option in the short term. So, the way I'm going about turning
> merging off is as follows:
> 1. Before warmup, call:
> IndexWriter.setMaxMergeDocs(0);
> IndexWriter.getLogMergePolicy().setMaxMergeMB(0);
> 2. After the warmup task completes, revert the above parameters to
> their defaults, then call:
> IndexWriter.maybeMerge();
> IndexWriter.waitForMerges();
> Now, I compared my results when deferring segment merges using the
> above method, with a test run letting Lucene do the merging on the
> fly. Curiously, the resulting size of indexes on disk is about 64%
> greater in the former case, although the total time to complete the
> warmup is almost the same.
> So I have a few of questions:
> - is the approach for deferring segment merging flawed in some way?
> - what could possibly account for the huge difference in file sizes?
> - what else could I possibly try to further speed up index writing
> during system's "off hours"?
> Thanks,
> -V
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message