kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jun Rao <jun...@gmail.com>
Subject Re: Kafka rebalancing causes Zookeeper to fail
Date Thu, 23 Jan 2014 04:44:29 GMT
You can find some of the GC settings in
https://cwiki.apache.org/confluence/display/KAFKA/Operations

There were some ZK bugs exposed during session expiration, which were fixed
in 3.3.4. Not sure if 3.4.5 exposes any new issues. The easiest thing is
probably to avoid GC-induced ZK session timeout in the first place or use a
larger session timeout.

Thanks,

Jun


On Wed, Jan 22, 2014 at 8:29 AM, Ahmed H. <ahmed.hammad@gmail.com> wrote:

> Hello,
>
> I looked at that, not sure if it is applicable or not at this point. We
> used to have frequent rebalances, but that issue was mitigated by
> increasing the zktimeout on the consumer side. With that said, it may still
> be a problem. I have't collected any metrics concerning rebalances in a
> while. I will certainly take a look at our current GC settings. What are
> typical settings that we should have for GC (I am not sure of what exactly
> I'm looking for)?
>
> As for downgrading the Zookeeper version, would there be any major loss of
> functionality? Version 3.4.5 is currently stable, so I am unsure of how it
> would help. I can try it and let it soak for a while to see if it helps or
> not. The problem is we have many components that tie into Zookeeper and I'm
> worried that downgrading may break some of our API calls to it.
>
> Is there a good way of trying to narrow this problem down further?
>
> Thanks again
>
>
> On Wed, Jan 22, 2014 at 10:15 AM, Jun Rao <junrao@gmail.com> wrote:
>
> > Not sure how stable ZK 3.4.5 is. Could you try 3.3.4? Also, see if
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Whyaretheremanyrebalancesinmyconsumerlog
> > ?
> > is applicable.
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Wed, Jan 22, 2014 at 6:24 AM, Ahmed H. <ahmed.hammad@gmail.com>
> wrote:
> >
> > > I have a basic Zookeeper/Kafka setup. I am still on Kafka 0.8 beta 1,
> and
> > > Zookeeper 3.4.5. The activity on this machine isn't massive...I would
> say
> > > the Kafka queues get a consistent 1 message every 2-3 seconds, as well
> as
> > > occasional spikes, but still nothing large enough to push the limits.
> > Both
> > > Kafka and Zookeeper are running on the same machine.
> > >
> > > Occasionally, a rebalance is triggered, which causes our Kafka clients
> to
> > > try reconnecting several times, but it ultimately fails with the
> > following
> > > error:
> > >
> > >
> > > 04:56:10,020 INFO  [kafka.consumer.ZookeeperConsumerConnector]
> > >
> (alarms.topology.updates_<host>-1383643783747-c7775701_watcher_executor)
> > > [alarms.topology.updates_<host>-1383643783747-c7775701], exception
> > > during rebalance : org.I0Itec.zkclient.exception.ZkNoNodeException:
> > > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode
> > > = NoNode for
> > >
> >
> /consumers/alarms.topology.updates/ids/alarms.topology.updates_<host>-1383643783747-c7775701
> > >         at
> > > org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47)
> > > [zkclient-0.3.jar:0.3]
> > >         at
> > > org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:685)
> > > [zkclient-0.3.jar:0.3]
> > >         at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:766)
> > > [zkclient-0.3.jar:0.3]
> > >         at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:761)
> > > [zkclient-0.3.jar:0.3]
> > >         at kafka.utils.ZkUtils$.readData(ZkUtils.scala:407)
> > > [kafka_2.9.2-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT]
> > >         at
> > > kafka.consumer.TopicCount$.constructTopicCount(TopicCount.scala:52)
> > > [kafka_2.9.2-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT]
> > >         at
> > >
> >
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.kafka$consumer$ZookeeperConsumerConnector$ZKRebalancerListener$$rebalance(ZookeeperConsumerConnector.scala:401)
> > > [kafka_2.9.2-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT]
> > >         at
> > >
> >
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1.apply$mcVI$sp(ZookeeperConsumerConnector.scala:374)
> > > [kafka_2.9.2-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT]
> > >         at
> > scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:78)
> > > [scala-library-2.9.2.jar:]
> > >         at
> > >
> >
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.syncedRebalance(ZookeeperConsumerConnector.scala:369)
> > > [kafka_2.9.2-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT]
> > >         at
> > >
> >
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anon$1.run(ZookeeperConsumerConnector.scala:326)
> > > [kafka_2.9.2-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT]
> > > Caused by: org.apache.zookeeper.KeeperException$NoNodeException:
> > > KeeperErrorCode = NoNode for
> > >
> > >
> >
> /consumers/alarms.topology.updates/ids/alarms.topology.updates_<host>-1383643783747-c7775701
> > >         at
> > > org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
> > > [zookeeper-3.4.3.jar:3.4.3-1240972]
> > >         at
> > > org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> > > [zookeeper-3.4.3.jar:3.4.3-1240972]
> > >         at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1131)
> > > [zookeeper-3.4.3.jar:3.4.3-1240972]
> > >         at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1160)
> > > [zookeeper-3.4.3.jar:3.4.3-1240972]
> > >         at
> > org.I0Itec.zkclient.ZkConnection.readData(ZkConnection.java:103)
> > > [zkclient-0.3.jar:0.3]
> > >         at org.I0Itec.zkclient.ZkClient$9.call(ZkClient.java:770)
> > > [zkclient-0.3.jar:0.3]
> > >         at org.I0Itec.zkclient.ZkClient$9.call(ZkClient.java:766)
> > > [zkclient-0.3.jar:0.3]
> > >         at
> > > org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)
> > > [zkclient-0.3.jar:0.3]
> > >         ... 9 more
> > >
> > >
> > > Our Kafka consumers are written in Clojure (
> > > https://github.com/pingles/clj-kafka).
> > >
> > > Any ideas on what can cause such behaviour? The rebalances themselves
> > > happen sporadically, but when they do, they sometimes fail and an error
> > > like the one above is shown. I'm not sure if this is a Kafka or
> Zookeeper
> > > problem at this point, but any help would be appreciated.
> > >
> > > Thanks
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message