lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Timothy Potter <thelabd...@gmail.com>
Subject Re: Optimization storage issue
Date Sat, 02 Mar 2013 18:54:38 GMT
Hi Manuel,

If you search "optimize" on this mailing list, you'll see that one of
the common suggestions is to avoid optimizing and fine-tune segment
merging instead. So to begin, take a look at your solrconfig.xml and
find out what your merge policy and mergeFactor are set to (note: they
may be commented out which implies segment merging is still enabled
with the default settings). You can experiment with changing the
mergeFactor.

Based on your description of adding and removing a few thousand
documents each day, I'm going to assume your documents are very large
otherwise I can't see how you'd ever notice an impact on query
performance. Is my assumption about the document size correct?

One thing you can try is to use the expungeDeletes attribute set to
"true" when you commit, ie. <commit expungeDeletes="true"/>. This
triggers Solr to merge any segments with deletes.

Lastly, I'm not sure about your specific questions related to
optimizations, but I think it's worth trying the suggestions above and
avoid optimizations altogether. I'm pretty sure the answer to #1 is no
and for #2 is it optimizes independently.

Cheers,
Tim


On Sat, Mar 2, 2013 at 10:24 AM, Manuel Le Normand
<manuel.lenormand@gmail.com> wrote:
> My use-case is a casi-monthly changing index. Everyday i index few
> thousands of docs and erase a similar number of older documents, whilst few
> documents last in the index for ever (about 20 % of my index). After few
> experiments, i get that leaving the older documents in the index (mostly in
> the *.tim file) slows down significally my avg qTime and got to the
> conclusion i need to optimize the index once every few days to get ride of
> the older documents.
>
> Optimization requires about 2 times more the index storage. As i have many
> shards and one replica for each, and the optimization occurs simultaneously
> for all, i need twice the amount of storage of my initial index size, while
> half of it is used very unfrequently (optimization takes about an hour).
>
> 1) Is there a possibility of using a storage pool for all shards, so every
> shard uses the spare storage in series, forcing the optimization to run
> unsimultaneously. In this case all the storage i'd use would be (total
> index storage + shard storage) instead of twice the total index storage.
>
> 2) When i run optimization for a replicated core, does it copy from its
> leader or does it optimize independenly?
>
> Thanks,
> Manu

Mime
View raw message