On 26 Aug 2016, at 12:58, kant kodali <kanth909@gmail.com> wrote:

@Steve your arguments make sense however there is a good majority of people who have extensive experience with zookeeper prefer to avoid zookeeper and given the ease of consul (which btw uses raft for the election) and etcd lot of us are more inclined to avoid ZK.

And yes any technology needs time for maturity but that said it shouldn't stop us from transitioning. for example people started using spark when it first released instead of waiting for spark 2.0 where there are lot of optimizations and bug fixes.


One way to look at the problem is "what is the cost if something doesn't work?"

If it's some HA consensus system, failure modes are "consensus failure, everything goes into minority mode and offline". service lost, data fine. Another  is "partition with both groups thinking they are in charge", which is more dangerous. then there's "partitioning event not detected", which may be bad.

so: consider the failure modes and then consider not so much whether the tech you are using is vulnerable to it, but "if it goes wrong, does it matter?"


Even before HDFS had HA with ZK/bookkeeper it didn't fail very often. And if you looked at the causes of those failures, things like backbone switch failure are so traumatic that things like ZK/etcd failures aren't going to make much of a difference. The filesystem is down.

Generally, integrity gets priority over availability. That said, S3 and the like have put availability ahead of consistency; Cassandra can offer that too.—sometimes it is the right strategy