storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anthony Milbourne <anthony.milbou...@mporium.com>
Subject UnknownHostException when a Zookeeper instance goes down on AWS
Date Wed, 22 Feb 2017 14:03:14 GMT
Hi,

We run a storm cluster (v.1.0.2) on AWS and have 3 Zookeepers supporting it.  Because AWS
sometimes terminates VMs, we sometimes lose a Zookeeper instance.  When this happens, the
hostname cannot be resolved for that zookeeper instance as AWS has taken the VM away.  We
noticed that in this case storm fails to connect to zookeeper - even though there are still
2 Zookeeper instances running.  It fails with an exception something like:

java.net.UnknownHostException: zookeeper3
  at java.net.InetAddress.getAllByName0(InetAddress.java:1280)
  at java.net.InetAddress.getAllByName(InetAddress.java:1192)
  at java.net.InetAddress.getAllByName(InetAddress.java:1126)
  at org.apache.storm.shade.org.apache.zookeeper.client.StaticHostProvider.<init>(StaticHostProvider.java:61)
  at org.apache.storm.shade.org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:445)
  at org.apache.storm.shade.org.apache.curator.utils.DefaultZookeeperFactory.newZooKeeper(DefaultZookeeperFactory.java:29)
  at org.apache.storm.shade.org.apache.curator.framework.imps.CuratorFrameworkImpl$2.newZooKeeper(CuratorFrameworkImpl.java:150)
  at org.apache.storm.shade.org.apache.curator.HandleHolder$1.getZooKeeper(HandleHolder.java:94)
  at org.apache.storm.shade.org.apache.curator.HandleHolder.getZooKeeper(HandleHolder.java:55)
  at org.apache.storm.shade.org.apache.curator.ConnectionState.reset(ConnectionState.java:218)
  at org.apache.storm.shade.org.apache.curator.ConnectionState.start(ConnectionState.java:103)
  at org.apache.storm.shade.org.apache.curator.CuratorZookeeperClient.start(CuratorZookeeperClient.java:190)
  at org.apache.storm.shade.org.apache.curator.framework.imps.CuratorFrameworkImpl.start(CuratorFrameworkImpl.java:259)
  at org.apache.storm.zookeeper$mk_client.doInvoke(zookeeper.clj:86)
  at clojure.lang.RestFn.invoke(RestFn.java:494)
 at org.apache.storm.cluster_state.zookeeper_state_factory$_mkState.invoke(zookeeper_state_factory.clj:28)
  at org.apache.storm.cluster_state.zookeeper_state_factory.mkState(Unknown Source)
  <SNIP REST OF STACKTRACE>

Having done some research it looks like this error is caused by a bug in the Zookeeper client
library.  There is an issue for it here:
https://issues.apache.org/jira/browse/ZOOKEEPER-1576
This issue has been resolved in the version 3.5.x branch of Zookeeper.  However, after 2.5
years and 3 releases the 3.5.x branch of Zookeeper is still in Alpha :(.

Despite the fact that it is in alpha, there is a branch of Curator (v.3.x.x) that uses it,
but Storm uses Curator version 2.x.x - possibly because it doesn't rely on alpha code.
So the bug is still unpatched in Storm

Does anyone have experience of this issue?
Can anyone offer any ideas for workarounds?

Thanks,

     Anthony.

Anthony Milbourne
anthony.milbourne@mporium.com
mporium.com
mporium Group Plc, registered in England and Wales - First Floor, 106 New Bond Street, London,
W1S 1DN
We're hiring -  join the mporium team

Mime
View raw message