kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Heath Ivie <hi...@AutoAnything.com>
Subject RE: Log Retention: What gets deleted
Date Fri, 08 Apr 2016 18:31:54 GMT
Gwen,

Thanks for the detailed reply.

That makes it more clear for me.

Heath

-----Original Message-----
From: Gwen Shapira [mailto:gwen@confluent.io] 
Sent: Tuesday, April 05, 2016 6:13 PM
To: users@kafka.apache.org
Subject: Re: Log Retention: What gets deleted

I think you got it almost right. The missing part is that we only delete whole partition segments,
not individual messages.

As you are writing messages, every X bytes or Y milliseconds, a new file gets created for
the partition to store new messages in. Those files are called segments.
The segment you are currently writing to is an active segment.

We will never delete an active segment, so in order to delete old messages we will look for
an inactive segment where the newest message is older than our retention and delete the entire
segment.

So there are several parameters controlling when will data get deleted (I'm looking at just
the time based, not the size-based):
1. log.retention.ms - how old messages should be before we consider them for deletion 2. log.roll.ms
- how frequently we roll new segments. Messages will not get deleted before a new segment
is rolled 3. log.retention.check.interval.ms - how frequently we check for segments that we
can delete.

A message will be deleted if all 3 are true:
1. It is older than log.retention.ms
2. It is in an inactive segment, meaning enough time passed since the message was written
to roll a new segment 3. Kafka checked for segments that can be deleted, meaning that more
than check.interval.ms time passed since the segment was rolled.

Hope this helps,

Gwen



On Fri, Apr 1, 2016 at 12:21 PM, Heath Ivie <hivie@autoanything.com> wrote:

> Hi,
>
> I have some questions about the log retention and specifically what 
> gets deleted.
>
> I have a test app where I am writing 10 logs to the topic every second.
>
> What I would expect is a lag in a group would be somewhere around 10 
> if I have retention.ms at 1000.
>
> What I am seeing that the lag continues to grow, but then at some 
> point all messages are gone and the lag is at 0.
>
> I thought that the messages that are old would be deleted first.
>
> Am I misinterpreting how the log retention works?
>
> Heath Ivie
> Solutions Architect
>
>
> Warning: This e-mail may contain information proprietary to 
> AutoAnything Inc. and is intended only for the use of the intended 
> recipient(s). If the reader of this message is not the intended 
> recipient(s), you have received this message in error and any review, 
> dissemination, distribution or copying of this message is strictly 
> prohibited. If you have received this message in error, please notify 
> the sender immediately and delete all copies.
>
Mime
View raw message