storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From S G <sg.online.em...@gmail.com>
Subject Re: UnknownHostException when a Zookeeper instance goes down on AWS
Date Thu, 23 Feb 2017 16:32:36 GMT
Thanks Anthony,

This is a serious issue with zookeeper.

Did you try asking in the zookeeper forums too why they are still in Alpha?
It might be worth knowing what is causing them to not release a non-alpha
version?

Thanks
SG

On Wed, Feb 22, 2017 at 6:03 AM, Anthony Milbourne <
anthony.milbourne@mporium.com> wrote:

> Hi,
>
>
>
> We run a storm cluster (v.1.0.2) on AWS and have 3 Zookeepers supporting
> it.  Because AWS sometimes terminates VMs, we sometimes lose a Zookeeper
> instance.  When this happens, the hostname cannot be resolved for that
> zookeeper instance as AWS has taken the VM away.  We noticed that in this
> case storm fails to connect to zookeeper – even though there are still 2
> Zookeeper instances running.  It fails with an exception something like:
>
>
>
> java.net.UnknownHostException: zookeeper3
>
>   at java.net.InetAddress.getAllByName0(InetAddress.java:1280)
>
>   at java.net.InetAddress.getAllByName(InetAddress.java:1192)
>
>   at java.net.InetAddress.getAllByName(InetAddress.java:1126)
>
>   at org.apache.storm.shade.org.apache.zookeeper.client.
> StaticHostProvider.<init>(StaticHostProvider.java:61)
>
>   at org.apache.storm.shade.org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:445)
>
>
>   at org.apache.storm.shade.org.apache.curator.utils.
> DefaultZookeeperFactory.newZooKeeper(DefaultZookeeperFactory.java:29)
>
>   at org.apache.storm.shade.org.apache.curator.framework.imps.
> CuratorFrameworkImpl$2.newZooKeeper(CuratorFrameworkImpl.java:150)
>
>   at org.apache.storm.shade.org.apache.curator.HandleHolder$1.
> getZooKeeper(HandleHolder.java:94)
>
>   at org.apache.storm.shade.org.apache.curator.HandleHolder.
> getZooKeeper(HandleHolder.java:55)
>
>   at org.apache.storm.shade.org.apache.curator.ConnectionState.reset(ConnectionState.java:218)
>
>
>   at org.apache.storm.shade.org.apache.curator.ConnectionState.start(ConnectionState.java:103)
>
>
>   at org.apache.storm.shade.org.apache.curator.
> CuratorZookeeperClient.start(CuratorZookeeperClient.java:190)
>
>   at org.apache.storm.shade.org.apache.curator.framework.imps.
> CuratorFrameworkImpl.start(CuratorFrameworkImpl.java:259)
>
>   at org.apache.storm.zookeeper$mk_client.doInvoke(zookeeper.clj:86)
>
>   at clojure.lang.RestFn.invoke(RestFn.java:494)
>
>  at org.apache.storm.cluster_state.zookeeper_state_factory$
> _mkState.invoke(zookeeper_state_factory.clj:28)
>
>   at org.apache.storm.cluster_state.zookeeper_state_factory.mkState(Unknown
> Source)
>
>   <SNIP REST OF STACKTRACE>
>
>
>
> Having done some research it looks like this error is caused by a bug in
> the Zookeeper client library.  There is an issue for it here:
>
> https://issues.apache.org/jira/browse/ZOOKEEPER-1576
>
> This issue has been resolved in the version 3.5.x branch of Zookeeper.
> However, after 2.5 years and 3 releases the 3.5.x branch of Zookeeper is
> still in Alpha L.
>
>
>
> Despite the fact that it is in alpha, there is a branch of Curator
> (v.3.x.x) that uses it, but Storm uses Curator version 2.x.x – possibly
> because it doesn’t rely on alpha code.
>
> So the bug is still unpatched in Storm
>
>
>
> Does anyone have experience of this issue?
>
> Can anyone offer any ideas for workarounds?
>
>
>
> Thanks,
>
>
>
>      Anthony.
> Anthony Milbourne
> *anthony.milbourne@mporium.com* <anthony.milbourne@mporium.com>
> *mporium.com* <https://mporium.com/>
> [image: LinkdIn] <https://www.linkedin.com/company/mporium>
> [image: Facebook] <https://www.facebook.com/mporiumgroup>
> [image: Twitter] <https://twitter.com/mporiumgroup>
> mporium Group Plc, registered in England and Wales - First
> Floor, 106 New Bond Street, London, W1S 1DN
> We're hiring -
> *join the mporium team* <https://mporium.com/careers>
>

Mime
View raw message