kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mika Linnanoja <mika.linnan...@rovio.com>
Subject Re: Unavailable partitions after upgrade to kafka 1.0.0
Date Mon, 23 Apr 2018 08:19:17 GMT
On Mon, Apr 23, 2018 at 10:51 AM, Brett Rann <brann@zendesk.com.invalid>

> > Mostly updating version variable in our puppet config file (masterless)
> and applying manually per instance. It works surprisingly well this way.
> Sure, we do the same, but with Chef. But we still follow that process. Lock
> in inter broker and log message format to existing version first. upgrade 1
> binary and restart 1 broker.

I did not do the log message format trick, so kind of brute force move to
new version, which we have used always before. I shall do it more manually
next time I suppose. Way less convenient, but if it avoids this kind of
issue which never seen before, of course.

> Then check that everything is OK before proceeding one step at a time. And
> OK check is at minimum that there are no under replicated or offline
> partitions reported by the cluster.
> If you had offline partitions, was it across multiple brokers, or just 1?
> And at which part in that process did it happen?

See other reply. I did the rolling upgrade one broker at a time, but didn't
really keep track when writes started to fail. Assume immediately when
first broker in the broken ones upgraded. In alphabetic order they were
kind of "mid way thru".

Only realized there are unavailable partitions after the upgrade procedure
was complete, investigating service logs to see what were the errors and
which topics/partitions they pointed to.

Doing upgrades to this cluster once every 2 years one tends to forget which
log prints are important from services. Outside of upgrades or big
partition reassignments due to growth it runs like a bullet train, zero

> Also, are all topics configured for HA?  ( RF => 2, and RF >
> min.insync.replicas).

Latter one does not seem to be in the config file, so whatever is default.
I believe it mimics the RF (2).

If nothing else, let this incident of ours serve as a warning to do exactly
as the book (upgrade guide) says, not sort of wing it. Thanks for fast
replies, lively mailing list!


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message