lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <>
Subject Re: index size before and after commit
Date Thu, 01 Oct 2009 13:25:07 GMT
It may take some time before resources are released and garbage  
collected, so that may be part of the reason why things hang around  
and du doesn't report much of a drop.

On Oct 1, 2009, at 8:54 AM, Phillip Farber wrote:

> I am trying to automate a build process that adds documents to 10  
> shards over 5 machines and need to limit the size of a shard to no  
> more than 200GB because I only have 400GB of disk available to  
> optimize a given shard.
> Why does the size (du) of an index typically decrease after a  
> commit?  I've observed a decrease in size of as much as from 296GB  
> down to 151GB or as little as from 183GB to 182GB.  Is that size  
> after a commit close to the size the index would be after an  
> optimize?  For that matter, are there cases where optimization can  
> take more than 2x?  I've heard of cases but have not observed them  
> in my system.

I seem to recall a case where it can be 3x, but I don't know that it  
has been observed much.

> I only do adds to the shards, never query them. An LVM snapshot of  
> the shard receives the queries.
> Is doing a commit before I take a du a reliable way to gauge the  
> size of the shard?  It is really bad news to allow a shard to go  
> over 200GB in my use case.  How do others manage this problem of 2x  
> space needed to optimize with "limited" dosk space?

Do you need to optimize at all?

Grant Ingersoll

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

View raw message