lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lance Norskog <goks...@gmail.com>
Subject Re: index size before and after commit
Date Thu, 01 Oct 2009 18:49:39 GMT
I've heard there is a new "partial optimize" feature in Lucene, but it
is not mentioned in the Solr or Lucene wikis so I cannot advise you
how to use it.

On a previous project we had a 500GB index for 450m documents. It took
14 hours to optimize. We found that Solr worked well (given enough RAM
for sorting and faceting requests) but that the IT logistics of a 500G
fileset were too much.

Also, if you want your query servers to continue serving while
propogating the newly optimized index, you need 2X space to store both
copies on the slave during the transfer. For us this 35 minutes over
1G ethernet.

On Thu, Oct 1, 2009 at 7:36 AM, Walter Underwood <wunder@wunderwood.org> wrote:
> I've now worked on three different search engines and they all have a 3X
> worst
> case on space, so I'm familiar with this case. --wunder
>
> On Oct 1, 2009, at 7:15 AM, Mark Miller wrote:
>
>> Nice one ;) Its not technically a case where optimize requires > 2x
>> though in case the user asking gets confused. Its a case unrelated to
>> optimize that can grow your index. Then you need < 2x for the optimize,
>> since you won't copy the deletes.
>>
>> It also requires that you jump hoops to delete everything. If you delete
>> everything with *:*, that is smart enough not to just do a delete on
>> every document - it just creates a new index, allowing the removal of
>> the old very efficiently.
>>
>> Def agree on the more disk space.
>>
>> Walter Underwood wrote:
>>>
>>> Here is how you need 3X. First, index everything and optimize. Then
>>> delete everything and reindex without any merges.
>>>
>>> You have one full-size index containing only deleted docs, one
>>> full-size index containing reindexed docs, and need that much space
>>> for a third index.
>>>
>>> Honestly, disk is cheap, and there is no way to make Lucene work
>>> reliably with less disk. 1TB is a few hundred dollars. You have a free
>>> search engine, buy some disk.
>>>
>>> wunder
>>>
>>> On Oct 1, 2009, at 6:25 AM, Grant Ingersoll wrote:
>>>
>>>>> 151GB or as little as from 183GB to 182GB.  Is that size after a
>>>>> commit close to the size the index would be after an optimize?  For
>>>>> that matter, are there cases where optimization can take more than
>>>>> 2x?  I've heard of cases but have not observed them in my system.
>>>>
>>>> I seem to recall a case where it can be 3x, but I don't know that it
>>>> has been observed much.
>>>
>>
>>
>> --
>> - Mark
>>
>> http://www.lucidimagination.com
>>
>>
>>
>
>



-- 
Lance Norskog
goksron@gmail.com

Mime
View raw message