kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Bethune <jonathan.beth...@instaclustr.com>
Subject Re: Spread log segment deletion over a couple hours
Date Thu, 03 May 2018 06:20:56 GMT
Howdy Vincent.

Sounds like a painful situation! I have experienced similar drama with
Kafka so maybe I can offer some advice.

You said you decreased the retention time on 4 topics. I wonder, was this
done on all 4 topics at the same time?

Depending on broker and partition config, that can be very painful. With
Kafka you can configure log deletion settings at the topic level.

In the future you should consider doing these sorts of changes one topic at
a time unless there is some compelling reason to do them simultaneously.

You also wrote that you saw a spike in CPU load and disk usage. There are a
number of ways you can configure log cleanup so as to use less disk space
and CPU.

You can reduce retention.bytes or retention.ms to make Kafka run cleanups
more frequently based on log size and time respectively. You can also
directly throttle the log cleaner by setting log.cleaner.threads and
log.cleaner.io.max.bytes.per.second.

Check the documentation for all the relevant config.
<https://kafka.apache.org/documentation/>

Again consider setting all of this at the topic level, especially if you're
topics are very different in terms of system resource usage.

I hope that helps you out a bit. Good luck!

On 3 May 2018 at 03:52, Vincent Rischmann <vincent@rischmann.fr> wrote:

> Hi,
>
> I'm wondering if there is a way to tell Kafka to spread the log file
> deletion when decreasing the retention time of a topic, and if not, if
> it would make sense.
> I'm asking because this afternoon, after decreasing the retention time
> from 2 months to 1 month on 4 of my topics, the whole cluster became
> overloaded for approximately 15 minutes (every broker with 25+ load,
> disk usage almost 100%), with leader reelection, under replicated
> partitions, and a bunch of consumers unable to make progress.
> The change removed 5Tib of data across the 4 topics and I didn't check
> beforehand to make sure how it would affect disk i/o, so it's on me that
> this happened, but seeing how much data was removed I think it would
> make sense to delete only a couple segments at a time in order to not
> overload the disks.
> Right now I can only be careful and plan the decrease in small steps but
> that's going to be a little tedious.
> How does everyone deal with this ?
>



-- 

*Jonathan Bethune - **Senior Consultant*

JP: +81 70 4069 4357

<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>. This email has been sent on behalf
of Instaclustr Pty. Limited (Australia) and Instaclustr Inc (USA). This
email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message