lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <markrmil...@gmail.com>
Subject Attention Solr 4.0 SolrCloud users
Date Fri, 07 Dec 2012 02:11:58 GMT
I should have sent this some time ago:

https://issues.apache.org/jira/browse/SOLR-3940 "Rejoining the leader election incorrectly
triggers the code path for a fresh cluster start rather than fail over."

The above is a somewhat ugly bug.

It means that if you are playing around with recovery and you kill a replica in a shard, it
will take 3 minutes before a new leader takes over.

This will be fixed in the upcoming 4.1 release (And has been fixed on 4x since early October).

This wait is only meant for cluster startup. The idea is that you might introduce some random,
old, out of date shard and then start up your cluster - you don't want that shard to be a
leader - so we wait around for all known shards to startup so they can all participate in
the initial leader election and the best one can be chosen. It's meant as a protective measure
against a fairly unlikely event. But it's kicking in when it shouldn't.

You can just accept the 3 minute wait, or you can lower the wait from 3 minutes (to like 10
seconds or to 0 seconds - just avoid the scenario I mention above if you do).

You can set the wait time in solr.xml by adding the attribute leaderVoteWait={whatever miliseconds}
to the cores node.

Sorry about this - completely my fault.

- Mark
Mime
View raw message