kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ismael Juma <ism...@juma.me.uk>
Subject Re: Insanely long recovery time with Kafka 0.11.0.2
Date Sat, 06 Jan 2018 15:36:19 GMT
Hi Ted,

The change you mention is not part of 0.11.0.2.

Ismael

On Sat, Jan 6, 2018 at 3:31 PM, Ted Yu <yuzhihong@gmail.com> wrote:

> bq. WARN Found a corrupted index file due to requirement failed: Corrupt
> index found, index file
> (/data/kafka/data-processed-15/00000000000054942918.index)
>
> Can you search backward for 00000000000054942918.index in the log to see if
> we can find the cause for corruption ?
>
> This part of code was recently changed by :
>
> KAFKA-6324; Change LogSegment.delete to deleteIfExists and harden log
> recovery
>
> Cheers
>
> On Sat, Jan 6, 2018 at 7:18 AM, Vincent Rischmann <vincent@rischmann.fr>
> wrote:
>
> > Here's an excerpt just after the broker started:
> > https://pastebin.com/tZqze4Ya
> >
> > After more than 8 hours of recovery the broker finally started. I haven't
> > read through all 8 hours of log but the parts I looked at are like the
> > pastebin.
> >
> > I'm not seeing much in the log cleaner logs either, they look normal. We
> > have a couple of compacted topics but seems only the consumer offsets is
> > ever compacted (the other topics don't have much traffic).
> >
> > On Sat, Jan 6, 2018, at 12:02 AM, Brett Rann wrote:
> > > What do the broker logs say its doing during all that time?
> > >
> > > There are some consumer offset / log cleaner bugs which caused us
> > similarly
> > > log delays. that was easily visible by watching the log cleaner
> activity
> > in
> > > the logs, and in our monitoring of partition sizes watching them go
> down,
> > > along with IO activity on the host for those files.
> > >
> > > On Sat, Jan 6, 2018 at 7:48 AM, Vincent Rischmann <
> vincent@rischmann.fr>
> > > wrote:
> > >
> > > > Hello,
> > > >
> > > > so I'm upgrading my brokers from 0.10.1.1 to 0.11.0.2 to fix this bug
> > > > https://issues.apache.org/jira/browse/KAFKA-4523
> > > > <https://issues.apache.org/jira/browse/KAFKA-4523>
> > > > Unfortunately while stopping one broker, it crashed exactly because
> of
> > > > this bug. No big deal usually, except after restarting Kafka in
> > 0.11.0.2
> > > > the recovery is taking a really long time.
> > > > I have around 6TB of data on that broker, and before when it crashed
> it
> > > > usually took around 30 to 45 minutes to recover, but now I'm at
> almost
> > > > 5h since Kafka started and it's still not recovered.
> > > > I'm wondering what could have changed to have such a dramatic effect
> on
> > > > recovery time ? Is there maybe something I can tweak to try to reduce
> > > > the time ?
> > > > Thanks.
> > > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message