kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthias J. Sax" <matth...@confluent.io>
Subject Re: KafkaStreams - impact of retention on repartition topics
Date Sun, 25 Aug 2019 19:11:14 GMT
You cannot delete arbitrary data, however, it's possible to send a
"truncate request" to brokers, to delete data before the retention time
is reached:

https://cwiki.apache.org/confluence/display/KAFKA/KIP-107%3A+Add+deleteRecordsBefore%28%29+API+in+AdminClient

There is `AdminClient#deleteRecords(...)` API to do so.


-Matthias

On 8/21/19 9:09 PM, Murilo Tavares wrote:
> Thanks Matthias for the prompt response.
> Now just for curiosity, how does that work? I thought it was not possible
> to easily delete topic data...
> 
> 
> On Wed, Aug 21, 2019 at 4:51 PM Matthias J. Sax <matthias@confluent.io>
> wrote:
> 
>> No need to worry about this.
>>
>> Kafka Streams used "purge data" calls, to actively delete data from
>> those topics after the records are processed. Hence, those topics won't
>> grow unbounded but are "truncated" on a regular basis.
>>
>>
>> -Matthias
>>
>> On 8/21/19 11:38 AM, Murilo Tavares wrote:
>>> Hi
>>> I have a complex KafkaStreams topology, where I have a bunch of KTables
>>> that I regroup (rekeying) and aggregate so I can join them.
>>> I've noticed that the "-repartition" topics created by the groupBy
>>> operations have a very long retention by default (Long.MAX_VALUE).
>>> I'm a bit concerned about the size of these topics, as they will retain
>>> data forever. I wonder why are they so long, and what would be the impact
>>> of reducing this retention?
>>> Thanks
>>> Murilo
>>>
>>
>>
> 


Mime
View raw message