kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Insanely long recovery time with Kafka 0.11.0.2
Date Sat, 06 Jan 2018 15:53:36 GMT
Ismael:
We're on the same page.

0.11.0.2 was released on 17 Nov 2017.

By 'recently' in my previous email I meant the change was newer.

Vincent:
Did the machine your broker ran on experience power issue ?

Cheers

On Sat, Jan 6, 2018 at 7:36 AM, Ismael Juma <ismael@juma.me.uk> wrote:

> Hi Ted,
>
> The change you mention is not part of 0.11.0.2.
>
> Ismael
>
> On Sat, Jan 6, 2018 at 3:31 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>
> > bq. WARN Found a corrupted index file due to requirement failed: Corrupt
> > index found, index file
> > (/data/kafka/data-processed-15/00000000000054942918.index)
> >
> > Can you search backward for 00000000000054942918.index in the log to see
> if
> > we can find the cause for corruption ?
> >
> > This part of code was recently changed by :
> >
> > KAFKA-6324; Change LogSegment.delete to deleteIfExists and harden log
> > recovery
> >
> > Cheers
> >
> > On Sat, Jan 6, 2018 at 7:18 AM, Vincent Rischmann <vincent@rischmann.fr>
> > wrote:
> >
> > > Here's an excerpt just after the broker started:
> > > https://pastebin.com/tZqze4Ya
> > >
> > > After more than 8 hours of recovery the broker finally started. I
> haven't
> > > read through all 8 hours of log but the parts I looked at are like the
> > > pastebin.
> > >
> > > I'm not seeing much in the log cleaner logs either, they look normal.
> We
> > > have a couple of compacted topics but seems only the consumer offsets
> is
> > > ever compacted (the other topics don't have much traffic).
> > >
> > > On Sat, Jan 6, 2018, at 12:02 AM, Brett Rann wrote:
> > > > What do the broker logs say its doing during all that time?
> > > >
> > > > There are some consumer offset / log cleaner bugs which caused us
> > > similarly
> > > > log delays. that was easily visible by watching the log cleaner
> > activity
> > > in
> > > > the logs, and in our monitoring of partition sizes watching them go
> > down,
> > > > along with IO activity on the host for those files.
> > > >
> > > > On Sat, Jan 6, 2018 at 7:48 AM, Vincent Rischmann <
> > vincent@rischmann.fr>
> > > > wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > so I'm upgrading my brokers from 0.10.1.1 to 0.11.0.2 to fix this
> bug
> > > > > https://issues.apache.org/jira/browse/KAFKA-4523
> > > > > <https://issues.apache.org/jira/browse/KAFKA-4523>
> > > > > Unfortunately while stopping one broker, it crashed exactly because
> > of
> > > > > this bug. No big deal usually, except after restarting Kafka in
> > > 0.11.0.2
> > > > > the recovery is taking a really long time.
> > > > > I have around 6TB of data on that broker, and before when it
> crashed
> > it
> > > > > usually took around 30 to 45 minutes to recover, but now I'm at
> > almost
> > > > > 5h since Kafka started and it's still not recovered.
> > > > > I'm wondering what could have changed to have such a dramatic
> effect
> > on
> > > > > recovery time ? Is there maybe something I can tweak to try to
> reduce
> > > > > the time ?
> > > > > Thanks.
> > > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message