kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Otto <o...@wikimedia.org>
Subject Unexpected broker election
Date Fri, 21 Feb 2014 18:22:18 GMT
Hi all,

This has happened a couple of times to me now in the past month, and I’m not entirely sure
of the cause, although I have a suspicion.

Early this morning (UTC), it looks like one of my two brokers (id 21) lost its connection
to Zookeeper for a very short period of time.  This caused the second broker (id 22) to quickly
become the leader for all partitions.  Once broker 21 was able to re-establish its Zookeeper
connection, it noticed that it has a stale list for the ISR, got its updated list, and started
replicating from broker 22 for all partitions.  Broker 21 then quickly rejoined the ISR, but
annoyingly (but expectedly), broker 22 remained the leader.  All of this happened in under
a minute.

I’m wondering if https://issues.apache.org/jira/browse/KAFKA-766 is related.  The current
batch size on our producers is 6000 msgs or 1000 ms (I’ve been meaning to reduce this).
 We do about 6000 msgs per second / per producer, and have 10 partitions in this relevant
topic.  A couple of days ago, we noticed flapping ISR Shrink/Expand logs, so I upped replica.lag.max.messages
to 10000, so that it would surely be above our batch size.  I still occasionally see flapping
ISR Shrinks/Expands, but hope that when I reduce the producer batch size, I will stop seeing
these.

Anyway, I’m not entirely sure what happened here.  Could flapping ISRs potentially cause
this?

For reference, the relevant logs from my brokers and a zookeeper are here: https://gist.github.com/ottomata/9139443

Thanks!
-Andrew Otto



Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message