kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Manikumar <manikumar.re...@gmail.com>
Subject Re: Unavailable partitions after upgrade to kafka 1.0.0
Date Mon, 23 Apr 2018 07:29:49 GMT
Hi,

Before Kafka 1.1.0, If the unclean leader election is enabled and if there
are no ISRs, the leader is set to -1 and ISR will be empty.
During upgrade, If you have single replica partitions or  if all replicas
goes out of ISR, then we get into this situation.

>From Kafka 0.11.0.0, Unclean leader election is disabled by default. With
this change,  Kafka can not elect a new leader
from empty ISR/leader.

What is the replication factor? Was unclean election enabled (It enabled by
default in 0.10.0.1)?
With sufficient replication factor and healthy ISR, we may not see this
issue.


On Mon, Apr 23, 2018 at 12:29 PM, Enrique Medina Montenegro <
e.medina.m@gmail.com> wrote:

> What type of storage do you have for your setup?
>
>
> En 23 de abril de 2018 8:04:46 a. m. Mika Linnanoja <
> mika.linnanoja@rovio.com> escribió:
>
> Hello,
>>
>> Last week I upgraded one relatively large kafka (EC2, 10 brokers, ~30 TB
>> data, 100-300 Mbps in/out per instance) 0.10.0.1 cluster to 1.0, and saw
>> some issues.
>>
>> Out of ~100 topics with 2..20 partitions each, 9 partitions in 8 topics
>> become "unavailable" across 3 brokers. The leader was shown as -1 and ISR
>> was empty. Java service using 0.10.0.1 clients was unable to send any data
>> to these partitions so it got dropped.
>>
>> The partitions were shown on the `kafka/bin/kafka-topics.sh --zookeeper
>> <zk's> --unavailable-partitions --describe` output. Nothing special about
>> these partitions, among them were big ones (hundreds of gigs) and tiny
>> ones
>> (megabytes).
>>
>> The fix was to set up the unclean leader elections and restart one of the
>> affected brokers in each partition: `kafka/bin/kafka-configs.sh
>> --zookeeper
>> <zk's> --entity-type topics --entity-name <topicname> --add-config
>> unclean.leader.election.enable=true --alter`.
>>
>> Anyone seen something like this, how to avoid it when next upgrading
>> perchance? Maybe it would be better if said cluster got no traffic during
>> upgrade, but we cannot have a maintenance break as everything is up 24/7.
>> Cluster is for analytics data, some of which is consumed in real-time
>> applications, mostly by secor.
>>
>> BR,
>> Mika
>>
>> --
>> *Mika Linnanoja*
>> Senior Cloud Engineer
>> Games Technology
>> Rovio Entertainment Corp
>> Keilaranta 7, FIN - 02150 Espoo, Finland
>> mika.linnanoja@rovio.com
>> www.rovio.com
>>
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message