kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Bukowinski <pmb...@gmail.com>
Subject Re: what happened in case of single disk failure
Date Wed, 04 Mar 2020 15:50:10 GMT
Yes, you should restart the broker. I don’t believe there’s any code to check if a Log
directory previously marked as failed has returned to healthy.

I always restart the broker after a hardware repair. I treat broker restarts as a normal,
non-disruptive operation in my clusters. I use a minimum of 3x replication.

-- Peter (from phone)

> On Mar 4, 2020, at 12:46 AM, 张祥 <xiangzhang1128@gmail.com> wrote:
> Another question, according to my memory, the broker needs to be restarted
> after replacing disk to recover this. Is that correct? If so, I take that
> Kafka cannot know by itself that the disk has been replaced, manually
> restart is necessary.
> 张祥 <xiangzhang1128@gmail.com> 于2020年3月4日周三 下午2:48写道:
>> Thanks Peter, it makes a lot of sense.
>> Peter Bukowinski <pmbuko@gmail.com> 于2020年3月3日周二 上午11:56写道:
>>> Whether your brokers have a single data directory or multiple data
>>> directories on separate disks, when a disk fails, the topic partitions
>>> located on that disk become unavailable. What happens next depends on how
>>> your cluster and topics are configured.
>>> If the topics on the affected broker have replicas and the minimum ISR
>>> (in-sync replicas) count is met, then all topic partitions will remain
>>> online and leaders will move to another broker. Producers and consumers
>>> will continue to operate as usual.
>>> If the topics don’t have replicas or the minimum ISR count is not met,
>>> then the topic partitions on the failed disk will be offline. Producers can
>>> still send data to the affected topics — it will just go to the online
>>> partitions. Consumers can still consume data from the online partitions.
>>> -- Peter
>>>>> On Mar 2, 2020, at 7:00 PM, 张祥 <xiangzhang1128@gmail.com> wrote:
>>>>> Hi community,
>>>>> I ran into disk failure when using Kafka, and fortunately it did not
>>> crash
>>>> the entire cluster. So I am wondering how Kafka handles multiple disks
>>> and
>>>> it manages to work in case of single disk failure. The more detailed,
>>> the
>>>> better. Thanks !

View raw message