spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <>
Subject Re: What do I loose if I run spark without using HDFS or Zookeeper?
Date Fri, 26 Aug 2016 16:50:48 GMT

On 26 Aug 2016, at 12:58, kant kodali <<>>

@Steve your arguments make sense however there is a good majority of people who have extensive
experience with zookeeper prefer to avoid zookeeper and given the ease of consul (which btw
uses raft for the election) and etcd lot of us are more inclined to avoid ZK.

And yes any technology needs time for maturity but that said it shouldn't stop us from transitioning.
for example people started using spark when it first released instead of waiting for spark
2.0 where there are lot of optimizations and bug fixes.

One way to look at the problem is "what is the cost if something doesn't work?"

If it's some HA consensus system, failure modes are "consensus failure, everything goes into
minority mode and offline". service lost, data fine. Another  is "partition with both groups
thinking they are in charge", which is more dangerous. then there's "partitioning event not
detected", which may be bad.

so: consider the failure modes and then consider not so much whether the tech you are using
is vulnerable to it, but "if it goes wrong, does it matter?"

Even before HDFS had HA with ZK/bookkeeper it didn't fail very often. And if you looked at
the causes of those failures, things like backbone switch failure are so traumatic that things
like ZK/etcd failures aren't going to make much of a difference. The filesystem is down.

Generally, integrity gets priority over availability. That said, S3 and the like have put
availability ahead of consistency; Cassandra can offer that too.—sometimes it is the right

View raw message