kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Otto <o...@wikimedia.org>
Subject Re: Broker rejoin with big replica lag
Date Wed, 05 Feb 2014 22:18:01 GMT
> - Increasing num.replica.fetchers (defaults is one)
Awesome!  I just tried this one, bumped it up to 8 (12 cores on this broker box).  It is now
catching up at around 17K msgs/sec, which will mean it will finish in about 4 or 5 hours.
 I’ll check up on it again tomorrow.

That should do it,  Thanks!

On Feb 5, 2014, at 5:04 PM, Joel Koshy <jjkoshy.w@gmail.com> wrote:

>> topics are all caught up, but I have one high volume topic (around
>> 40K msgs/sec) that is taking much longer.  I just took a few samples
>> of Replica-MaxLag to see how long it would take to catch up.
>> Currently, it is behind about 12.5 million messages and is catching
>> up at a rate of about 1600 msgs/sec.  At that rate, it’ll take
>> around 9 days before the replica is caught up to the leader.
>> Is there any way to speed this up?
> During the period your high-volume topic is under-replicated you can
> temporarily try one or both of the following:
> - Increasing num.replica.fetchers (defaults is one)
> - If you don't have too many topic-partitions you can also increase
>  replica.fetch.max.bytes.
>> Or, alternatively, I don’t actually care about this topic’s
>> history.  It is a new topic, and I know that it doesn't yet have any
>> consumers.  I’d be fine with instructing both brokers to drop
>> old logs and just start from the top of the log.  I could do this by
>> manually deleting the topic (kafka data files and in zookeeper), but
>> to do so properly with 0.8.0 I think I’d have to shut down the
>> whole cluster, correct?  I’d rather not do this, as another
>> topic does have a consumer and I don’t want to lose messages for
>> it.
> Right - or you could do a rolling bounce and change the retention
> settings (http://kafka.apache.org/documentation.html#brokerconfigs) of
> that topic to something low so it gets expired and then do another
> rolling bounce to remove the override.
> -- 
> Joel

View raw message