kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tom Bentley <tbent...@redhat.com>
Subject Re: Intended behavior when a broker loses its log volume
Date Mon, 12 Oct 2020 09:41:55 GMT
Hi Peter,

When an unexpected IOException happens when accessing a file in a log
directory, the broker will take that log directory offline. That means
follower fetchers for partitions on that log dir will be stopped, the
broker will stop serving requests from those logs and a notification is
sent to Zookeeper. When the controller receives notification it queries the
broker (via a LeaderAndIsrRequest) to find out exactly which partitions are
affected and those where the response error code indicates a storage error
will have new leaders elected via the usual mechanism. When a broker has no
more log directories online it will exit.

So in the setup you describe, I would expect that the broker should only
have lost leadership of those partitions which were on the affected log
dir. In a broker with a single log directory the broker would exit. So the
behaviour is not the same, but prioritises availability if the broker is
able to continue functioning with the remaining volumes. As an
administrator you'd have to notice the loss of the volume and restart the
broker manually.



On Sun, Oct 11, 2020 at 8:40 PM Peter Bukowinski <pmbuko@gmail.com> wrote:

> Greeting, all.
> What is the expected behavior of a broker when it loses its only
> configured data log directory?
> I’m running kafka 2.2.1 in aws and we had an outage caused by the loss of
> an attached volume on one of the brokers. The broker did not relinquish
> leadership of its topic partitions when this occurred, so it caused an
> outage that was only mitigated after we restarted the broker, forcing
> leadership changes. I run kafka on bare metal with JBOD data dirs, and
> losing a disk in those clusters does not cause an outage.
> I’m curious what I should expect with only one storage location per broker.
> —
> Peter Bukowinski

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message