kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vincent Rischmann <vinc...@rischmann.fr>
Subject Re: Insanely long recovery time with Kafka 0.11.0.2
Date Sat, 06 Jan 2018 15:18:43 GMT
Here's an excerpt just after the broker started: https://pastebin.com/tZqze4Ya

After more than 8 hours of recovery the broker finally started. I haven't read through all
8 hours of log but the parts I looked at are like the pastebin.

I'm not seeing much in the log cleaner logs either, they look normal. We have a couple of
compacted topics but seems only the consumer offsets is ever compacted (the other topics don't
have much traffic).

On Sat, Jan 6, 2018, at 12:02 AM, Brett Rann wrote:
> What do the broker logs say its doing during all that time?
> 
> There are some consumer offset / log cleaner bugs which caused us similarly
> log delays. that was easily visible by watching the log cleaner activity in
> the logs, and in our monitoring of partition sizes watching them go down,
> along with IO activity on the host for those files.
> 
> On Sat, Jan 6, 2018 at 7:48 AM, Vincent Rischmann <vincent@rischmann.fr>
> wrote:
> 
> > Hello,
> >
> > so I'm upgrading my brokers from 0.10.1.1 to 0.11.0.2 to fix this bug
> > https://issues.apache.org/jira/browse/KAFKA-4523
> > <https://issues.apache.org/jira/browse/KAFKA-4523>
> > Unfortunately while stopping one broker, it crashed exactly because of
> > this bug. No big deal usually, except after restarting Kafka in 0.11.0.2
> > the recovery is taking a really long time.
> > I have around 6TB of data on that broker, and before when it crashed it
> > usually took around 30 to 45 minutes to recover, but now I'm at almost
> > 5h since Kafka started and it's still not recovered.
> > I'm wondering what could have changed to have such a dramatic effect on
> > recovery time ? Is there maybe something I can tweak to try to reduce
> > the time ?
> > Thanks.
> >

Mime
View raw message