lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <>
Subject Re: index size before and after commit
Date Thu, 01 Oct 2009 13:48:45 GMT
Whoops - they way I have mail come in, not easy to tell if I'm replying
to Lucene or Solr list ;)

The way Solr works with Searchers and reopen, it shouldn't run into a
situation that requires greater than
2x to optimize. I won't guarantee it ;) But based on what I know, it
shouldn't happen under normal circumstances.

Mark Miller wrote:
> Phillip Farber wrote:
>> I am trying to automate a build process that adds documents to 10
>> shards over 5 machines and need to limit the size of a shard to no
>> more than 200GB because I only have 400GB of disk available to
>> optimize a given shard.
>> Why does the size (du) of an index typically decrease after a commit? 
>> I've observed a decrease in size of as much as from 296GB down to
>> 151GB or as little as from 183GB to 182GB.  Is that size after a
>> commit close to the size the index would be after an optimize?  
> Likely. Until you commit or close the Writer, the unoptimized index is
> the "live" index. And then you also have the optimized index. Once you
> commit and make the optimized index the "live" index, the unoptimized
> index can be removed (depending on your delete policy, which by default
> only keeps the latest commit point).
>> For that matter, are there cases where optimization can take more than
>> 2x?  I've heard of cases but have not observed them in my system.  I
>> only do adds to the shards, never query them. An LVM snapshot of the
>> shard receives the queries.
> There are cases where it takes over 2x - but they involve using reopen.
> If you have more than one Reader on the index, and only reopen some of
> them, the new Readers created can hold open the partially optimized
> segments that existed at that moment, creating a need for greater than 2x.
>> Is doing a commit before I take a du a reliable way to gauge the size
>> of the shard?  It is really bad news to allow a shard to go over 200GB
>> in my use case.  How do others manage this problem of 2x space needed
>> to optimize with "limited" dosk space?
> Get more disk space ;) Or don't optimize. A lower mergefactor can make
> optimizations less necessary.
>> Advice greatly appreciated.
>> Phil

- Mark

View raw message