lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <markrmil...@gmail.com>
Subject Re: How much disk space does optimize really take
Date Wed, 07 Oct 2009 19:38:22 GMT
Okay - I think I've got you - your talking about the case of adding a
bunch of docs, not calling commit, and then trying to optimize. I keep
coming at it from a cold optimize. Making sense to me now.

Mark Miller wrote:
> I can't tell why calling a commit or restarting is going to help
> anything - or why you need more than 2x in any case. The only reason i
> can see this being is if you have turned on auto-commit. Otherwise the
> Reader is *always* only referencing what would have to be around anyway.
>
> Your likely to just too close to the edge. There are fragmentation
> issues and whatnot when your dealing with such large files and so little
> space above what you need.
>
> Phillip Farber wrote:
>   
>> Wow, this is weird.  I commit before I optimize.  In fact, I bounce
>> tomcat before I optimize just in case. It makse sense, as you say,
>> that then "the open reader can only be holding references to segments
>> that wouldn't be deleted until the optimize is complete anyway".
>>
>> But we're still exceeding 2x. And after the optimize fails, if we then
>> do a commit or bounce tomcat, a bunch of segments disappear. I am
>> stumped.
>>
>> Yonik Seeley wrote:
>>     
>>> On Wed, Oct 7, 2009 at 1:50 PM, Phillip Farber <pfarber@umich.edu>
>>> wrote:
>>>       
>>>> So this implies that for a "normal" optimize, in every case, due to the
>>>> Searcher holding open the existing segment prior to optimize that we'd
>>>> always need 3x even in the normal case.
>>>>
>>>> This seems wrong since it is repeated stated that in the normal case
>>>> only 2x
>>>> is needed and I have successfully optimized a similar sized 192G
>>>> index on
>>>> identical hardware with a 400G capacity.
>>>>         
>>> 2x for the IndexWriter only.
>>> Having an open index reader can increase that somewhat... 3x is the
>>> absolute worst case I think and that can currently be avoided by first
>>> calling commit and then calling optimize I think.  This way the open
>>> reader will only be holding references to segments that wouldn't be
>>> deleted until the optimize is complete anyway.
>>>
>>>
>>> -Yonik
>>> http://www.lucidimagination.com
>>>       
>
>
>   


-- 
- Mark

http://www.lucidimagination.com




Mime
View raw message