kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ahmed H." <ahmed.ham...@gmail.com>
Subject Re: Kafka rebalancing causes Zookeeper to fail
Date Thu, 23 Jan 2014 19:24:04 GMT
When you say "use a larger session timeout", which session timeout do you
refer to? Is it the zookeeper session timeout variable that we define when
creating a Kafka consumer? Or is there a different session timeout?

As for downgrading, that is currently not an option for the time being, so
I will have to have some better debugging tools to pinpoint the cause.

Thanks


On Wed, Jan 22, 2014 at 11:44 PM, Jun Rao <junrao@gmail.com> wrote:

> You can find some of the GC settings in
> https://cwiki.apache.org/confluence/display/KAFKA/Operations
>
> There were some ZK bugs exposed during session expiration, which were fixed
> in 3.3.4. Not sure if 3.4.5 exposes any new issues. The easiest thing is
> probably to avoid GC-induced ZK session timeout in the first place or use a
> larger session timeout.
>
> Thanks,
>
> Jun
>
>
> On Wed, Jan 22, 2014 at 8:29 AM, Ahmed H. <ahmed.hammad@gmail.com> wrote:
>
> > Hello,
> >
> > I looked at that, not sure if it is applicable or not at this point. We
> > used to have frequent rebalances, but that issue was mitigated by
> > increasing the zktimeout on the consumer side. With that said, it may
> still
> > be a problem. I have't collected any metrics concerning rebalances in a
> > while. I will certainly take a look at our current GC settings. What are
> > typical settings that we should have for GC (I am not sure of what
> exactly
> > I'm looking for)?
> >
> > As for downgrading the Zookeeper version, would there be any major loss
> of
> > functionality? Version 3.4.5 is currently stable, so I am unsure of how
> it
> > would help. I can try it and let it soak for a while to see if it helps
> or
> > not. The problem is we have many components that tie into Zookeeper and
> I'm
> > worried that downgrading may break some of our API calls to it.
> >
> > Is there a good way of trying to narrow this problem down further?
> >
> > Thanks again
> >
> >
> > On Wed, Jan 22, 2014 at 10:15 AM, Jun Rao <junrao@gmail.com> wrote:
> >
> > > Not sure how stable ZK 3.4.5 is. Could you try 3.3.4? Also, see if
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Whyaretheremanyrebalancesinmyconsumerlog
> > > ?
> > > is applicable.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > >
> > > On Wed, Jan 22, 2014 at 6:24 AM, Ahmed H. <ahmed.hammad@gmail.com>
> > wrote:
> > >
> > > > I have a basic Zookeeper/Kafka setup. I am still on Kafka 0.8 beta 1,
> > and
> > > > Zookeeper 3.4.5. The activity on this machine isn't massive...I would
> > say
> > > > the Kafka queues get a consistent 1 message every 2-3 seconds, as
> well
> > as
> > > > occasional spikes, but still nothing large enough to push the limits.
> > > Both
> > > > Kafka and Zookeeper are running on the same machine.
> > > >
> > > > Occasionally, a rebalance is triggered, which causes our Kafka
> clients
> > to
> > > > try reconnecting several times, but it ultimately fails with the
> > > following
> > > > error:
> > > >
> > > >
> > > > 04:56:10,020 INFO  [kafka.consumer.ZookeeperConsumerConnector]
> > > >
> > (alarms.topology.updates_<host>-1383643783747-c7775701_watcher_executor)
> > > > [alarms.topology.updates_<host>-1383643783747-c7775701], exception
> > > > during rebalance : org.I0Itec.zkclient.exception.ZkNoNodeException:
> > > > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode
> > > > = NoNode for
> > > >
> > >
> >
> /consumers/alarms.topology.updates/ids/alarms.topology.updates_<host>-1383643783747-c7775701
> > > >         at
> > > > org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47)
> > > > [zkclient-0.3.jar:0.3]
> > > >         at
> > > > org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:685)
> > > > [zkclient-0.3.jar:0.3]
> > > >         at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:766)
> > > > [zkclient-0.3.jar:0.3]
> > > >         at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:761)
> > > > [zkclient-0.3.jar:0.3]
> > > >         at kafka.utils.ZkUtils$.readData(ZkUtils.scala:407)
> > > > [kafka_2.9.2-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT]
> > > >         at
> > > > kafka.consumer.TopicCount$.constructTopicCount(TopicCount.scala:52)
> > > > [kafka_2.9.2-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT]
> > > >         at
> > > >
> > >
> >
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.kafka$consumer$ZookeeperConsumerConnector$ZKRebalancerListener$$rebalance(ZookeeperConsumerConnector.scala:401)
> > > > [kafka_2.9.2-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT]
> > > >         at
> > > >
> > >
> >
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1.apply$mcVI$sp(ZookeeperConsumerConnector.scala:374)
> > > > [kafka_2.9.2-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT]
> > > >         at
> > > scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:78)
> > > > [scala-library-2.9.2.jar:]
> > > >         at
> > > >
> > >
> >
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.syncedRebalance(ZookeeperConsumerConnector.scala:369)
> > > > [kafka_2.9.2-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT]
> > > >         at
> > > >
> > >
> >
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anon$1.run(ZookeeperConsumerConnector.scala:326)
> > > > [kafka_2.9.2-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT]
> > > > Caused by: org.apache.zookeeper.KeeperException$NoNodeException:
> > > > KeeperErrorCode = NoNode for
> > > >
> > > >
> > >
> >
> /consumers/alarms.topology.updates/ids/alarms.topology.updates_<host>-1383643783747-c7775701
> > > >         at
> > > > org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
> > > > [zookeeper-3.4.3.jar:3.4.3-1240972]
> > > >         at
> > > > org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> > > > [zookeeper-3.4.3.jar:3.4.3-1240972]
> > > >         at
> org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1131)
> > > > [zookeeper-3.4.3.jar:3.4.3-1240972]
> > > >         at
> org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1160)
> > > > [zookeeper-3.4.3.jar:3.4.3-1240972]
> > > >         at
> > > org.I0Itec.zkclient.ZkConnection.readData(ZkConnection.java:103)
> > > > [zkclient-0.3.jar:0.3]
> > > >         at org.I0Itec.zkclient.ZkClient$9.call(ZkClient.java:770)
> > > > [zkclient-0.3.jar:0.3]
> > > >         at org.I0Itec.zkclient.ZkClient$9.call(ZkClient.java:766)
> > > > [zkclient-0.3.jar:0.3]
> > > >         at
> > > > org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)
> > > > [zkclient-0.3.jar:0.3]
> > > >         ... 9 more
> > > >
> > > >
> > > > Our Kafka consumers are written in Clojure (
> > > > https://github.com/pingles/clj-kafka).
> > > >
> > > > Any ideas on what can cause such behaviour? The rebalances themselves
> > > > happen sporadically, but when they do, they sometimes fail and an
> error
> > > > like the one above is shown. I'm not sure if this is a Kafka or
> > Zookeeper
> > > > problem at this point, but any help would be appreciated.
> > > >
> > > > Thanks
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message