kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jay Kreps <jay.kr...@gmail.com>
Subject Re: undesirable log retention behavior
Date Fri, 01 Aug 2014 05:10:37 GMT
This is a real problem you describe. Unfortunately adding the
timestamp to the file name won't help the case you describe as
replicas don't directly interact with files they just fetch messages
by offset so there is really no clean way for them to get modification
times from the source broker.

I think a quicker approach here would be to make the size limits more
usable. We will have to think about how best to correct the problem
you describe.

-Jay

On Thu, Jul 31, 2014 at 6:51 PM, Steven Wu <stevenwu@netflix.com.invalid> wrote:
> it seems that log retention is purely based on last touch/modified
> timestamp. This is undesirable for code push in aws/cloud.
>
> e.g. let's say retention window is 24 hours. disk size is 1 TB. disk util
> is 60% (600GB). when new instance comes up, it will fetch log files (600GB)
> from peers. those log files all have newer timestamps. they won't be purged
> until 24 hours later. note that during the first 24 hours, new msgs
> (another 600GB) continue to come in. This can cause disk full problem
> without any intervention. With this behavior, we have to keep disk util
> under 50%.
>
> can last modified timestamp be inserted into the file name when rolling
> over log files? then kafka can check the file name for timestamp. does this
> make sense?
>
> Thanks,
> Steven

Mime
View raw message