kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven Wu <steve...@netflix.com.INVALID>
Subject undesirable log retention behavior
Date Fri, 01 Aug 2014 01:51:38 GMT
it seems that log retention is purely based on last touch/modified
timestamp. This is undesirable for code push in aws/cloud.

e.g. let's say retention window is 24 hours. disk size is 1 TB. disk util
is 60% (600GB). when new instance comes up, it will fetch log files (600GB)
from peers. those log files all have newer timestamps. they won't be purged
until 24 hours later. note that during the first 24 hours, new msgs
(another 600GB) continue to come in. This can cause disk full problem
without any intervention. With this behavior, we have to keep disk util
under 50%.

can last modified timestamp be inserted into the file name when rolling
over log files? then kafka can check the file name for timestamp. does this
make sense?


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message