kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Insanely long recovery time with Kafka 0.11.0.2
Date Sat, 06 Jan 2018 15:31:10 GMT
bq. WARN Found a corrupted index file due to requirement failed: Corrupt
index found, index file
(/data/kafka/data-processed-15/00000000000054942918.index)

Can you search backward for 00000000000054942918.index in the log to see if
we can find the cause for corruption ?

This part of code was recently changed by :

KAFKA-6324; Change LogSegment.delete to deleteIfExists and harden log
recovery

Cheers

On Sat, Jan 6, 2018 at 7:18 AM, Vincent Rischmann <vincent@rischmann.fr>
wrote:

> Here's an excerpt just after the broker started:
> https://pastebin.com/tZqze4Ya
>
> After more than 8 hours of recovery the broker finally started. I haven't
> read through all 8 hours of log but the parts I looked at are like the
> pastebin.
>
> I'm not seeing much in the log cleaner logs either, they look normal. We
> have a couple of compacted topics but seems only the consumer offsets is
> ever compacted (the other topics don't have much traffic).
>
> On Sat, Jan 6, 2018, at 12:02 AM, Brett Rann wrote:
> > What do the broker logs say its doing during all that time?
> >
> > There are some consumer offset / log cleaner bugs which caused us
> similarly
> > log delays. that was easily visible by watching the log cleaner activity
> in
> > the logs, and in our monitoring of partition sizes watching them go down,
> > along with IO activity on the host for those files.
> >
> > On Sat, Jan 6, 2018 at 7:48 AM, Vincent Rischmann <vincent@rischmann.fr>
> > wrote:
> >
> > > Hello,
> > >
> > > so I'm upgrading my brokers from 0.10.1.1 to 0.11.0.2 to fix this bug
> > > https://issues.apache.org/jira/browse/KAFKA-4523
> > > <https://issues.apache.org/jira/browse/KAFKA-4523>
> > > Unfortunately while stopping one broker, it crashed exactly because of
> > > this bug. No big deal usually, except after restarting Kafka in
> 0.11.0.2
> > > the recovery is taking a really long time.
> > > I have around 6TB of data on that broker, and before when it crashed it
> > > usually took around 30 to 45 minutes to recover, but now I'm at almost
> > > 5h since Kafka started and it's still not recovered.
> > > I'm wondering what could have changed to have such a dramatic effect on
> > > recovery time ? Is there maybe something I can tweak to try to reduce
> > > the time ?
> > > Thanks.
> > >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message