kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bryan Baugher <bjb...@gmail.com>
Subject Re: Strange behavior during un-clean leader election
Date Tue, 21 Oct 2014 20:34:26 GMT
Yes the cluster was to a degree restarted in a rolling fashion but due to
some other events causing the brokers to be rather confused the ISR for a
number of partitions became empty and a new controller was elected.
KAFKA-1647 sounds exactly like the problem I encountered. Thank you.

On Tue, Oct 21, 2014 at 3:28 PM, Guozhang Wang <wangguoz@gmail.com> wrote:

> Bryan,
>
> Did you take down some brokers in your cluster while hitting KAFKA-1028? If
> yes, you may be hitting KAFKA-1647 also.
>
> Guozhang
>
> On Mon, Oct 20, 2014 at 1:18 PM, Bryan Baugher <bjbq4d@gmail.com> wrote:
>
> > Hi everyone,
> >
> > We run a 3 Kafka cluster using 0.8.1.1 with all topics having a
> replication
> > factor of 3 meaning every broker has a replica of every partition.
> >
> > We recently ran into this issue (
> > https://issues.apache.org/jira/browse/KAFKA-1028) and saw data loss
> within
> > Kafka. We understand why it happened and have plans to try to ensure it
> > doesn't happen again.
> >
> > The strange part was that the broker that was chosen for the un-clean
> > leader election seemed to drop all of its own data about the partition in
> > the process as our monitoring shows the broker offset was reset to 0 for
> a
> > number of partitions.
> >
> > Following the broker's server logs in chronological order for a
> particular
> > partition that saw data loss I see this,
> >
> > 2014-10-16 10:18:11,104 INFO kafka.log.Log: Completed load of log TOPIC-6
> > with log end offset 528026
> >
> > 2014-10-16 10:20:18,144 WARN
> > kafka.controller.OfflinePartitionLeaderSelector:
> > [OfflinePartitionLeaderSelector]: No broker in ISR is alive for
> [TOPIC,6].
> > Elect leader 1 from live brokers 1,2. There's potential data loss.
> >
> > 2014-10-16 10:20:18,277 WARN kafka.cluster.Partition: Partition [TOPIC,6]
> > on broker 1: No checkpointed highwatermark is found for partition
> [TOPIC,6]
> >
> > 2014-10-16 10:20:18,698 INFO kafka.log.Log: Truncating log TOPIC-6 to
> > offset 0.
> >
> > 2014-10-16 10:21:18,788 INFO kafka.log.OffsetIndex: Deleting index
> > /storage/kafka/00/kafka_data/TOPIC-6/00000000000000528024.index.deleted
> >
> > 2014-10-16 10:21:18,781 INFO kafka.log.Log: Deleting segment 528024 from
> > log TOPIC-6.
> >
> > I'm not too worried about this since I'm hoping to move to Kafka 0.8.2
> ASAP
> > but I was curious if anyone could explain this behavior.
> >
> > -Bryan
> >
>
>
>
> --
> -- Guozhang
>



-- 
Bryan

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message