kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Rosenberg <...@squareup.com>
Subject Re: Broker crashes when no space left for log.dirs
Date Thu, 15 Aug 2013 17:18:26 GMT
A related question:  Will producers sending messages with acknowledgment,
get a failed ack if a broker is out of disk space, or will messages get
buffered in memory successfully (resulting in a good ack, before failing to
be written).

It seems like it might be a good feature to have the broker auto-detect if
it's log dir is nearing full, so that there is some runway to gracefully
shutdown, while still writing any in memory buffered messages.  It could be
an optional threshold, like 98% full, or X Mb free, etc.


On Wed, Aug 14, 2013 at 7:58 PM, Jay Kreps <jay.kreps@gmail.com> wrote:

> The crash is actually just a call to shutdown. We think this is the right
> thing to do, though I agree it is unintuitive. Here is why. When you get an
> out of space error it is likely that the operating system did a partial
> write, leaving you with a corrupt log. Furthermore it is possible that
> space will free up at which point more writes on the log could succeed so
> you wouldn't even know there was a problem but all your consumers would hit
> this data and choke.
> By "crashing" the node we ensure that recovery is run on the log to bring
> it into a consistent state.
> Theoretically we could leave the node up accepting reads but rejecting
> writes while attempting to recover the log. But there are a bunch of
> problems with this. But this is very complex. Likely if you are out of
> space you are just going to keep getting writes, and running out of space
> again and then running recovery and so on. This kind of crazy loop is much
> worse then just needing to bring the node back up.
> Alternately we could leave the node up but go into some kind of
> write-rejecting mode forever. But this would still require that you restart
> the node, and we would have to implement that write-rejecting node.
> Cheers,
> -Jay
> On Wed, Aug 14, 2013 at 9:52 AM, Bryan Baugher <bjbq4d@gmail.com> wrote:
> > Hi,
> >
> > This is more of a thought question than a problem that I need support
> for.
> > I have trying out Kafka 0.8.0-beta1 with replication. For our user case
> we
> > want to try and guarantee that our consumers will see all messages even
> if
> > they have fallen greatly behind the broker/producer. For this reason I
> > wanted to know how the broker would react when the filesystem it writes
> its
> > messages to is full. What I found was that the broker crashes and cannot
> be
> > started until the filesystem has space again.
> >
> > Is there or would it make sense to provide configuration allowing the
> > broker to reject writes in this case rather than crashing, electing a new
> > leader and attempting the write again? I can clearly understand the use
> > case that we don't want to 'lose' messages from the producer and I could
> > also see how lack of filesystem space could be considered a machine
> > failure, but with replication I would think if you are running out of
> space
> > on 1 broker you are likely running out of space on others.
> >
> > Bryan
> >

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message