lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Wartes <>
Subject Re: replica recovery
Date Fri, 20 Nov 2015 01:07:39 GMT

I completely agree with the other comments on this thread with regard to
needing more disk space asap, but I thought I’d add a few comments
regarding the specific questions here.

If your goal is to prevent full recovery requests, you only need to cover
the duration you expect a replica to be unavailable.

If your common issue is GC due to bad queries, you probably don’t need to
cover more than the number of docs you write in your typical full GC
pause. I suspect this is less than 10M.
If your common issue is the length of time it takes you to notice a server
crashed and restart it, you may need to cover something like 10 minutes
worth of docs. I suspect this is still less than 10M.

You certainly don’t need to keep an entire day’s transaction logs. If your
servers routinely go down for a whole day, solve that by fixing your
servers. :)

With respect to ulimit, if Solr is the only thing of significance on the
box, there’s no reason not to bump that up. I usually just set something
like 32k and stop thinking about it. I get hit by a low ulimit every now
and then, but I can’t recall ever having had an issue with it being too

On 11/19/15, 6:21 AM, "Brian Scholl" <> wrote:

>I have opted to modify the number and size of transaction logs that I
>keep to resolve the original issue I described.  In so doing I think I
>have created a new problem, feedback is appreciated.
>Here are the new updateLog settings:
>    <updateLog>
>      <str name="dir">${solr.ulog.dir:}</str>
>      <int 
>      <int name="numRecordsToKeep">10000000</int>
>      <int name="maxNumLogsToKeep">5760</int>
>    </updateLog>
>First I want to make sure I understand what these settings do:
>	numRecordsToKeep: per transaction log file keep this number of documents
>	maxNumLogsToKeep: retain this number of transaction log files total
>During my testing I thought I observed that a new tlog is created every
>time auto-commit is triggered (every 15 seconds in my case) so I set
>maxNumLogsToKeep high enough to contain an entire days worth of updates.
> Knowing that I could potentially need to bulk load some data I set
>numRecordsToKeep higher than my max throughput per replica for 15 seconds.
>The problem that I think this has created is I am now running out of file
>descriptors on the servers.  After indexing new documents for a couple
>hours a some servers (not all) will start logging this error rapidly:
>73021439 WARN  
>.0.0.0:8983}) [   ] o.e.j.s.ServerConnector
> Too many open files
>	at Method)
>	at 
>	at 
>	at 
>	at 
>	at 
>	at 
>	at
>The output of ulimit -n for the user running the solr process is 1024.  I
>am pretty sure I can prevent this error from occurring  by increasing the
>limit on each server but it isn't clear to me how high it should be or if
>raising the limit will cause new problems.
>Any advice you could provide in this situation would be awesome!
>> On Oct 27, 2015, at 20:50, Jeff Wartes <> wrote:
>> On the face of it, your scenario seems plausible. I can offer two pieces
>> of info that may or may not help you:
>> 1. A write request to Solr will not be acknowledged until an attempt has
>> been made to write to all relevant replicas. So, B won’t ever be missing
>> updates that were applied to A, unless communication with B was
>> somehow at the time of the update request. You can add a min_rf param to
>> your write request, in which case the response will tell you how many
>> replicas received the update, but it’s still up to your indexer client
>> decide what to do if that’s less than your replication factor.
>> See 
>> Tolerance for more info.
>> 2. There are two forms of replication. The usual thing is for the leader
>> for each shard to write an update to all replicas before acknowledging
>> write itself, as above. If a replica is less than N docs behind the
>> leader, the leader can replay those docs to the replica from its
>> transaction log. If a replica is more than N docs behind though, it
>> back to the replication handler recovery mode you mention, and attempts
>> re-sync the whole shard from the leader.
>> The default N for this is 100, which is pretty low for a
>> index. It can be changed by increasing the size of the transaction log,
>> (via numRecordsToKeep) but be aware that a large transaction log size
>> delay node restart.
>> See 
>> ig#UpdateHandlersinSolrConfig-TransactionLog for more info.
>> Hope some of that helps, I don’t know a way to say
>> delete-first-on-recovery.
>> On 10/27/15, 5:21 PM, "Brian Scholl" <> wrote:
>>> Whoops, in the description of my setup that should say 2 replicas per
>>> shard.  Every server has a replica.
>>>> On Oct 27, 2015, at 20:16, Brian Scholl <> wrote:
>>>> Hello,
>>>> I am experiencing a failure mode where a replica is unable to recover
>>>> and it will try to do so forever.  In writing this email I want to
>>>> sure that I haven't missed anything obvious or missed a configurable
>>>> option that could help.  If something about this looks funny, I would
>>>> really like to hear from you.
>>>> Relevant details:
>>>> - solr 5.3.1
>>>> - java 1.8
>>>> - ubuntu linux 14.04 lts
>>>> - the cluster is composed of 1 SolrCloud collection with 100 shards
>>>> backed by a 3 node zookeeper ensemble
>>>> - there are 200 solr servers in the cluster, 1 replica per shard
>>>> - a shard replica is larger than 50% of the available disk
>>>> - ~40M docs added per day, total indexing time is 8-10 hours spread
>>>> over the day
>>>> - autoCommit is set to 15s
>>>> - softCommit is not defined
>>>> I think I have traced the failure to the following set of events but
>>>> would appreciate feedback:
>>>> 1. new documents are being indexed
>>>> 2. the leader of a shard, server A, fails for any reason (java
>>>> times out with zookeeper, etc)
>>>> 3. zookeeper promotes the other replica of the shard, server B, to the
>>>> leader position and indexing resumes
>>>> 4. server A comes back online (typically 10s of seconds later) and
>>>> reports to zookeeper
>>>> 5. zookeeper tells server A that it is no longer the leader and to
>>>> with server B
>>>> 6. server A checks with server B but finds that server B's index
>>>> version is different from its own
>>>> 7. server A begins replicating a new copy of the index from server B
>>>> using the (legacy?) replication handler
>>>> 8. the original index on server A was not deleted so it runs out of
>>>> disk space mid-replication
>>>> 9. server A throws an error, deletes the partially replicated index,
>>>> and then tries to replicate again
>>>> At this point I think steps 6  => 9 will loop forever
>>>> If the actual errors from solr.log are useful let me know, not doing
>>>> that now for brevity since this email is already pretty long.  In a
>>>> nutshell and in order, on server A I can find the error that took it
>>>> down, the post-recovery instruction from ZK to unregister itself as a
>>>> leader, the corrupt index error message, and then the (start - whoops,
>>>> out of disk- stop) loop of the replication messages.
>>>> I first want to ask if what I described is possible or did I get lost
>>>> somewhere along the way reading the docs?  Is there any reason to
>>>> that solr should not do this?
>>>> If my version of events is feasible I have a few other questions:
>>>> 1. What happens to the docs that were indexed on server A but never
>>>> replicated to server B before the failure?  Assuming that the replica
>>>> server A were to complete the recovery process would those docs appear
>>>> in the index or are they gone for good?
>>>> 2. I am guessing that the corrupt replica on server A is not deleted
>>>> because it is still viable, if server B had a catastrophic failure you
>>>> could pick up the pieces from server A.  If so is this a configurable
>>>> option somewhere?  I'd rather take my chances on server B going down
>>>> before replication finishes than be stuck in this state and have to
>>>> manually intervene.  Besides, I have disaster recovery backups for
>>>> exactly this situation.
>>>> 3. Is there anything I can do to prevent this type of failure?  It
>>>> seems to me that if server B gets even 1 new document as a leader the
>>>> shard will enter this state.  My only thought right now is to try to
>>>> stop sending documents for indexing the instant a leader goes down but
>>>> on the surface this solution sounds tough to implement perfectly (and
>>>> would have to be perfect).
>>>> If you got this far thanks for sticking with me.
>>>> Cheers,
>>>> Brian

View raw message