kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 张祥 <xiangzhang1...@gmail.com>
Subject Re: what happened in case of single disk failure
Date Thu, 12 Mar 2020 00:50:55 GMT
Thanks, very helpful !

Peter Bukowinski <pmbuko@gmail.com> 于2020年3月12日周四 上午5:48写道:

> Yes, that’s correct. While a broker is down:
>
> all topic partitions assigned to that broker will be under-replicated
> topic partitions with an unmet minimum ISR count will be offline
> leadership of partitions meeting the minimum ISR count will move to the
> next in-sync replica in the replica list
> if no in-sync replica exists for a topic-partitions, it will be offline
> Setting unclean.leader.election.enable=true will allow an out-of-sync
> replica to become a leader.
> If topic partition availability is more important to you than data
> integrity, you should allow unclean leader election.
>
>
> > On Mar 11, 2020, at 6:11 AM, 张祥 <xiangzhang1128@gmail.com> wrote:
> >
> > Hi, Peter, following what we talked about before, I want to understand
> what
> > will happen when one broker goes down, I would say it will be very
> similar
> > to what happens under disk failure, except that the rules apply to all
> the
> > partitions on that broker instead of only one malfunctioned disk. Am I
> > right? Thanks.
> >
> > 张祥 <xiangzhang1128@gmail.com> 于2020年3月5日周四 上午9:25写道:
> >
> >> Thanks Peter, really appreciate it.
> >>
> >> Peter Bukowinski <pmbuko@gmail.com> 于2020年3月4日周三 下午11:50写道:
> >>
> >>> Yes, you should restart the broker. I don’t believe there’s any code
to
> >>> check if a Log directory previously marked as failed has returned to
> >>> healthy.
> >>>
> >>> I always restart the broker after a hardware repair. I treat broker
> >>> restarts as a normal, non-disruptive operation in my clusters. I use a
> >>> minimum of 3x replication.
> >>>
> >>> -- Peter (from phone)
> >>>
> >>>> On Mar 4, 2020, at 12:46 AM, 张祥 <xiangzhang1128@gmail.com>
wrote:
> >>>>
> >>>> Another question, according to my memory, the broker needs to be
> >>> restarted
> >>>> after replacing disk to recover this. Is that correct? If so, I take
> >>> that
> >>>> Kafka cannot know by itself that the disk has been replaced, manually
> >>>> restart is necessary.
> >>>>
> >>>> 张祥 <xiangzhang1128@gmail.com> 于2020年3月4日周三 下午2:48写道:
> >>>>
> >>>>> Thanks Peter, it makes a lot of sense.
> >>>>>
> >>>>> Peter Bukowinski <pmbuko@gmail.com> 于2020年3月3日周二
上午11:56写道:
> >>>>>
> >>>>>> Whether your brokers have a single data directory or multiple
data
> >>>>>> directories on separate disks, when a disk fails, the topic
> partitions
> >>>>>> located on that disk become unavailable. What happens next depends
> on
> >>> how
> >>>>>> your cluster and topics are configured.
> >>>>>>
> >>>>>> If the topics on the affected broker have replicas and the minimum
> ISR
> >>>>>> (in-sync replicas) count is met, then all topic partitions will
> remain
> >>>>>> online and leaders will move to another broker. Producers and
> >>> consumers
> >>>>>> will continue to operate as usual.
> >>>>>>
> >>>>>> If the topics don’t have replicas or the minimum ISR count
is not
> met,
> >>>>>> then the topic partitions on the failed disk will be offline.
> >>> Producers can
> >>>>>> still send data to the affected topics — it will just go to
the
> online
> >>>>>> partitions. Consumers can still consume data from the online
> >>> partitions.
> >>>>>>
> >>>>>> -- Peter
> >>>>>>
> >>>>>>>> On Mar 2, 2020, at 7:00 PM, 张祥 <xiangzhang1128@gmail.com>
wrote:
> >>>>>>>>
> >>>>>>>> Hi community,
> >>>>>>>>
> >>>>>>>> I ran into disk failure when using Kafka, and fortunately
it did
> not
> >>>>>> crash
> >>>>>>> the entire cluster. So I am wondering how Kafka handles
multiple
> >>> disks
> >>>>>> and
> >>>>>>> it manages to work in case of single disk failure. The more
> detailed,
> >>>>>> the
> >>>>>>> better. Thanks !
> >>>>>>
> >>>>>
> >>>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message