kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gwen Shapira <gshap...@cloudera.com>
Subject Re: leaderless topicparts after single node failure: how to repair?
Date Wed, 10 Dec 2014 15:55:43 GMT
It looks like none of your replicas are in-sync. Did you enable unclean
leader election?
This will allow one of the un-synced replicas to become leader, leading to
data loss but maintaining availability of the topic.

Gwen


On Tue, Dec 9, 2014 at 8:43 AM, Neil Harkins <nharkins@gmail.com> wrote:

> Hi. We've suffered a single node HW failure (broker_id 4)
> with at least 2 replicas of each topic partition, but some
> topic parts are now leaderless (all were across 4,5):
>
> Topic: topic.with.two.replicas     Partition: 0    Leader: -1
> Replicas: 4,5   Isr:
>
> on broker 5, we see warnings like this in the logs:
>
> /var/log/kafka/kafka.log.2:[2014-12-05 05:21:28,216] 19186668
> [kafka-request-handler-4] WARN  kafka.server.ReplicaManager  -
> [Replica Manager on Broker 5]: While recording the follower position,
> the partition [topic.with.two.replicas,0] hasn't been created, skip
> updating leader HW
>
> /var/log/kafka/kafka.log.2:[2014-12-05 05:21:28,219] 19186671
> [kafka-request-handler-4] WARN  kafka.server.KafkaApis  - [KafkaApi-5]
> Fetch request with correlation id 36397 from client
> ReplicaFetcherThread-1-5 on partition [topic.with.two.replicas,0]
> failed due to Topic topic.with.two.replicas either doesn't exist or is
> in the process of being deleted
>
> We also have some topics which had 3 replicas also now leaderless:
>
> Topic:topic.with.three.replicas PartitionCount:6 ReplicationFactor:3
> Configs:
> Topic: topic.with.three.replicas Partition: 0 Leader: none Replicas: 3,1,2
> Isr:
>
> whose 'state' in zookeeper apparently disappeared:
> '/brokers/topics/topic.with.three.replicas/partitions/3/state':
> NoNodeError((), {})
>
> Our versions are:
> kafka 0.8.1
> zookeeper 3.4.5
>
> From searching archives of this list, the recommended "fix"
> is to blow away the topic(s) and recreate. At this point in time,
> that's an option, but it's not really acceptable for a reliable
> data pipeline. Are there options to repair specific partitions?
>
> -neil
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message