kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Guozhang Wang <wangg...@gmail.com>
Subject Re: Why is segment.ms=10m for repartition topics in KafkaStreams?
Date Tue, 09 Oct 2018 18:47:06 GMT
Hi Niklas,

Default value of segment.ms is set to 10min as part of this project
(introduced in Kafka 1.1.0):

https://jira.apache.org/jira/browse/KAFKA-6150

https://cwiki.apache.org/confluence/display/KAFKA/KIP-204+%3A+Adding+records+deletion+operation+to+the+new+Admin+Client+API

In KIP-204 (KAFKA-6150), we added admin request to periodically delete
records immediately upon committing offsets, to make repartition topics
really "transient", and along with it we set the default segment.ms to
10min. The rationale is that to make record purging effective, we need to
have smaller segment size so that we can delete those files after the
purged offset is larger that the segment's last offset in time.


Which Kafka version are you using currently? Did you observe that data
purging did not happen (otherwise segment files should be garbage collected
quickly), or is your traffic very small or commit infrequently which
resulted in ineffective purging?


Guozhang



On Tue, Oct 9, 2018 at 4:07 AM, Niklas Lönn <niklas.lonn@gmail.com> wrote:

> Hi,
>
> Recently we experienced a problem when resetting a streams application,
> doing quite a lot of operations based on 2 compacted source topics, with 20
> partitions.
>
> We crashed entire broker cluster with TooManyOpenFiles exception (We have a
> multi million limit already)
>
> When inspecting the internal topics configuration I noticed that the
> repartition topics have a default config of:
> *Configs:segment.bytes=52428800,segment.index.bytes=
> 52428800,cleanup.policy=delete,segment.ms
> <http://segment.ms>=600000*
>
> My source topic is a compacted topic used as a KTable, and lets assume I
> have data for every segment of 10min, I would quickly get 1.440 segments
> per partition per day.
>
> Since this repartition topic is not even compacted, I cant understand the
> reasoning behind having a default of 10min segment.ms and 50mb
> segment.bytes?
>
> Is there any best process regarding this? Potentially we could crash the
> cluster every-time we need to reset an application.
>
> And does it make sense that it would keep so many open files at the same
> time in the first place? Could it be a bug in file management of the Kafka
> broker?
>
> Kind regards
> Niklas
>



-- 
-- Guozhang

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message