kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Murilo Tavares <murilo...@gmail.com>
Subject Re: KafkaStreams - impact of retention on repartition topics
Date Mon, 26 Aug 2019 14:23:11 GMT
Cool! Thank you Matthias!


On Sun, 25 Aug 2019 at 15:11, Matthias J. Sax <matthias@confluent.io> wrote:

> You cannot delete arbitrary data, however, it's possible to send a
> "truncate request" to brokers, to delete data before the retention time
> is reached:
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-107%3A+Add+deleteRecordsBefore%28%29+API+in+AdminClient
>
> There is `AdminClient#deleteRecords(...)` API to do so.
>
>
> -Matthias
>
> On 8/21/19 9:09 PM, Murilo Tavares wrote:
> > Thanks Matthias for the prompt response.
> > Now just for curiosity, how does that work? I thought it was not possible
> > to easily delete topic data...
> >
> >
> > On Wed, Aug 21, 2019 at 4:51 PM Matthias J. Sax <matthias@confluent.io>
> > wrote:
> >
> >> No need to worry about this.
> >>
> >> Kafka Streams used "purge data" calls, to actively delete data from
> >> those topics after the records are processed. Hence, those topics won't
> >> grow unbounded but are "truncated" on a regular basis.
> >>
> >>
> >> -Matthias
> >>
> >> On 8/21/19 11:38 AM, Murilo Tavares wrote:
> >>> Hi
> >>> I have a complex KafkaStreams topology, where I have a bunch of KTables
> >>> that I regroup (rekeying) and aggregate so I can join them.
> >>> I've noticed that the "-repartition" topics created by the groupBy
> >>> operations have a very long retention by default (Long.MAX_VALUE).
> >>> I'm a bit concerned about the size of these topics, as they will retain
> >>> data forever. I wonder why are they so long, and what would be the
> impact
> >>> of reducing this retention?
> >>> Thanks
> >>> Murilo
> >>>
> >>
> >>
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message