lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Collins <danwcoll...@gmail.com>
Subject Solr cloud recovery, why does restarting leader need replicas?
Date Wed, 28 Nov 2012 15:47:15 GMT
I was testing the basic SolrCloud test scenario from the wiki page, and
found something (I considered) unexpected.

If the leader of the shard goes down, when it comes back up it requires N
replicas to be running (where N is determined from what was running before
I think).

Simple setup, 4 servers, 2 shards (A, B), each with 2 replicas, e.g. A1,
A2, B1, B2.

All 4 nodes start-up, A1, B1 are leaders, all is well.

A2 brought down, cloud is still fine. A2 brought back up and recovers, once
recovery complete, it is live.

A2 goes down, then A1.  Cloud is now unresponsive as Shard A has no nodes
(as expected).

A1 comes back up.  However, shard is still not responsive due to errors

2012-11-28 10:45:27,328 INFO [main] o.a.s.c.ShardLeaderElectionContext
[ElectionContext.java:287] Waiting until we see more replicas up: total=2
found=1 timeoutin=140262

I can understand that in the cloud setup A1 (if it wasn't the leader) would
have to recover, but as A1 was leader when it went down, shouldn't it be
able to service requests on its own (it was when it went down!)

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message