lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lance Norskog <goks...@gmail.com>
Subject Re: How much disk space does optimize really take
Date Wed, 07 Oct 2009 19:22:45 GMT
Oops, send before finished.  "Partial Optimize" aka "maxSegments" is a
recent Solr 1.4/Lucene 2.9 feature.

As to 2x v.s. 3x, the general wisdom is that an optimize on a "simple"
index takes at most 2x disk space, and on a "compound" index takes at
most 3x. "Simple" is the default (*). At Divvio we had the same
problem and it never took up more than 2x.

If your index disks are really bursting at the seams, you could try
creating an empty index on a separate disk and merging your large
index into that index. The resulting index will be "mostly optimized".

Lance Norskog

* in solrconfig.xml:
<useCompoundFile>false</useCompoundFile>

On 10/7/09, Phillip Farber <pfarber@umich.edu> wrote:
> Wow, this is weird.  I commit before I optimize.  In fact, I bounce
> tomcat before I optimize just in case. It makse sense, as you say, that
> then "the open reader can only be holding references to segments that
> wouldn't be deleted until the optimize is complete anyway".
>
> But we're still exceeding 2x. And after the optimize fails, if we then
> do a commit or bounce tomcat, a bunch of segments disappear. I am stumped.
>
> Yonik Seeley wrote:
>> On Wed, Oct 7, 2009 at 1:50 PM, Phillip Farber <pfarber@umich.edu> wrote:
>>> So this implies that for a "normal" optimize, in every case, due to the
>>> Searcher holding open the existing segment prior to optimize that we'd
>>> always need 3x even in the normal case.
>>>
>>> This seems wrong since it is repeated stated that in the normal case only
>>> 2x
>>> is needed and I have successfully optimized a similar sized 192G index on
>>> identical hardware with a 400G capacity.
>>
>> 2x for the IndexWriter only.
>> Having an open index reader can increase that somewhat... 3x is the
>> absolute worst case I think and that can currently be avoided by first
>> calling commit and then calling optimize I think.  This way the open
>> reader will only be holding references to segments that wouldn't be
>> deleted until the optimize is complete anyway.
>>
>>
>> -Yonik
>> http://www.lucidimagination.com
>


-- 
Lance Norskog
goksron@gmail.com

Mime
View raw message