lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rishi Easwaran <rishi.easwa...@aol.com>
Subject Re: Solr Cloud reclaiming disk space from deleted documents
Date Mon, 04 May 2015 14:27:02 GMT
Thanks Shawn.. yeah regular optimize might be the route we take, if this becomes a recurring
issue.
 I remember in our old multicore deployment CPU used to spike and the core almost became non
responsive. 

My guess with solr cloud architecture, any slack by leader while optimizing is picked up by
the replica.
I was searching around for optimize behaviour of solr cloud and could not find much information.

Does anyone have experience running optimize for solr cloud in a loaded production env?

Thanks,
Rishi.
 
 

 

 

-----Original Message-----
From: Shawn Heisey <apache@elyograg.org>
To: solr-user <solr-user@lucene.apache.org>
Sent: Mon, May 4, 2015 9:11 am
Subject: Re: Solr Cloud reclaiming disk space from deleted documents


On 5/4/2015 4:55 AM, Rishi Easwaran wrote:
> Sadly with the size of our
complex, spiting and adding more HW is not a viable long term solution. 
>  I
guess the options we have are to run optimize regularly and/or become aggressive
in our merges proactively even before solr cloud gets into this situation.

If
you are regularly deleting most of your index, or reindexing large
parts of it,
which effectively does the same thing, then regular
optimizes may be required
to keep the index size down, although you must
remember that you need enough
room for the core to grow in order to
actually complete the optimize.  If the
core is 75-90 percent deleted
docs, then you will not need 2x the core size to
optimize it, because
the new index will be much smaller.

Currently,
SolrCloud will always optimize the entire collection when you
ask for an
optimize on any core, but it will NOT optimize all the
replicas (cores) at the
same time.  It will go through the cores that
make up the collection and
optimize each one one in sequence.  If your
index is sharded and replicated
enough, hopefully that will make it
possible for the optimize to complete even
though the amount of disk
space available may be low.

We have at least one
issue in Jira where users have asked for optimize
to honor distrib=false, which
would allow the user to be in complete
control of all optimizing, but so far
that hasn't been implemented.  The
volunteers that maintain Solr can only
accomplish so much in the limited
time they have
available.

Thanks,
Shawn


 

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message