kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Mackles <pmack...@adobe.com>
Subject broker giving up partition leadership for no apparent reason
Date Mon, 23 Sep 2013 00:42:14 GMT
With 0.8, we have a situation where a broker is removing itself (or being removed) as a leader
for no apparent reason. The cluster has 3 nodes. In this case, broker id=1 stopped leading.
This is what I see in the server.log at the time it stopped leading:

[2013-09-22 14:00:06,141] INFO re-registering broker info in ZK for broker 1 (kafka.server.KafkaZooKeeper)
[2013-09-22 14:00:06,507] INFO Registered broker 1 at path /brokers/ids/1 with address
[2013-09-22 14:00:06,508] INFO done re-registering broker (kafka.server.KafkaZooKeeper)
[2013-09-22 14:00:06,509] INFO Subscribing to /brokers/topics path to watch for new topics
[2013-09-22 14:00:06,515] INFO Closing socket connection to / (kafka.network.Processor)
[2013-09-22 14:00:06,519] INFO conflict in /controller data: 1 stored data: 2 (kafka.utils.ZkUtils$)
[2013-09-22 14:00:06,526] INFO New leader is 2 (kafka.server.ZookeeperLeaderElector$LeaderChangeListener)

The broker process itself stayed up and I was able to get it back to leading by simply running
the preferred-replica-election tool. Looking at server.log, controller.log and state-change.log
on all 3 brokers, it's unclear what triggered this. I thought it might be a problem communicating
with ZK but I don't see any such errors. The broker had been running fine for several days
prior to this. I looked at the gc logs and I don't see any long running garbage collection
at that time.

What else should I be looking for?


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message