kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Otto <o...@wikimedia.org>
Subject Broker rejoin with big replica lag
Date Wed, 05 Feb 2014 17:01:06 GMT
Hi all!

I recently had a problem where one out of two of my brokers would not reboot due to a hardware
failure.  The broker was down for almost a week before the required part came in and was fixed
by our datacenter tech.  During that time, the live broker was able to handle all messages
for all topics and partitions (which is awesome!).  The first broker is now back, and is trying
to catch up with the messages that it missed for the during.  The lower volume topics are
all caught up, but I have one high volume topic (around 40K msgs/sec) that is taking much
longer.  I just took a few samples of Replica-MaxLag to see how long it would take to catch
up.  Currently, it is behind about 12.5 million messages and is catching up at a rate of about
1600 msgs/sec.  At that rate, it’ll take around 9 days before the replica is caught up to
the leader.

Is there any way to speed this up?

Or, alternatively, I don’t actually care about this topic’s history.  It is a new topic,
and I know that it doesn't yet have any consumers.  I’d be fine with instructing both brokers
to drop old logs and just start from the top of the log.  I could do this by manually deleting
the topic (kafka data files and in zookeeper), but to do so properly with 0.8.0 I think I’d
have to shut down the whole cluster, correct?  I’d rather not do this, as another topic
does have a consumer and I don’t want to lose messages for it.

-Andrew Otto

View raw message