lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pavel Micka <Pavel.Mi...@zoomint.com>
Subject Solr Cloud: Zookeeper failure modes
Date Wed, 02 Jan 2019 08:36:06 GMT
Hi,
We are currently implementing Solr cloud and as part of this effort we are investigating,
which failure modes may happen between Solr and Zookeeper.

We have found quite a lot articles describing the "happy path" failure, when ZK stops (loses
majority) and the Solr Cluster ceases to serve write requests (& read continues to work
as expected). Once ZK cluster is reconciled and majority achieved again, everything continues
working as expected.

What we have not been able to find is what happens when ZK cluster catastrophically fails
and loses its data. Either completely (scenario A) or is restarted from backup (scenario B).

So now the questions:

1)      Scenario A - Is existing Solr Cloud cluster able to start against a clean Zookeeper
and reconstruct all the ZK data from its internal state (using some king of emergency recovery;
it may take long)?

2)      Scenario B - What is the worst case backup/restore scenario? For example when

a.       ZK is backed up

b.       Cluster performs some transition between states "X -> Y" (such as commit shard,
elect new leader etc.)

c.       ZK fails completely

d.       ZK is restored from backup created in step a

e.       Solr Cloud is in state "Y", while ZK is in state "X"

Thanks in advance,

Pavel


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message