kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From András Serény <sereny.and...@gravityrd.com>
Subject Re: log.retention.size
Date Fri, 30 May 2014 09:45:51 GMT

Sorry for the delay on this.

Yes, that's right -- it'd be just another term in the chain of  'or' 
conditions. Currently it's <time limit> OR <size limit>. With the global 
condition, it would be
<time limit> OR <size limit> OR <global size limit>

In my view, that's fairly simple and intuitive, hence a fine piece of logic.

Regards,
András

On 5/27/2014 4:34 PM, Jun Rao wrote:
> For log.retention.bytes.per.topic and log.retention.hours.per.topic, the
> current interpretation is that those are tight bounds. In other words, only
> when those thresholds are violated, a segment is deleted. To further
> satisfy log.retention.bytes.global, the per topic thresholds may no longer
> be tight, i.e., we may need to delete a segment even when the per topic
> threshold is not violated.
>
> Thanks,
>
> Jun
>
>
> On Tue, May 27, 2014 at 12:22 AM, András Serény <sereny.andras@gravityrd.com
>> wrote:
>> No, I think more specific settings should get a chance first. I'm
>> suggesting that provided that there is a segment rolled for a topic, *any
>> *of log.retention.bytes.per.topic, log.retention.hours.per.topic, and a
>> future log.retention.bytes.global violation would cause segments to be
>> deleted.
>>
>> As far as I understand, the current logic says
>>
>> (1)
>> for each topic, if there is a segment already rolled {
>>      mark segments eligible for deletion due to
>> log.retention.hours.for.this.topic
>>      if log.retention.bytes.for.this.topic is still violated, mark
>> segments eligible for deletion due to log.retention.bytes.for.this.topic
>> }
>>
>> After this cleanup cycle, there could be another one,  taking into account
>> the global threshold. For instance, something along the lines of
>>
>> (2)
>> if after (1) log.retention.bytes.global is still violated, for each topic,
>> if there is a segment already rolled {
>>    calculate the required size for this topic (e.g. the proportional size,
>> or simply (full size - threshold)/#topics ?)
>>    mark segments exceeding the required size for deletion
>> }
>>
>> Regards,
>> András
>>
>>
>>
>> On 5/23/2014 4:46 PM, Jun Rao wrote:
>>
>>> Yes, that's possible. There is a default log.retention.bytes for every
>>> topic. By introducing a global threshold, we may have to delete data from
>>> logs whose size is smaller than log.retention.bytes. So, are you saying
>>> that the global threshold has precedence?
>>>
>>> Thanks,
>>>
>>> Jun
>>>
>>>
>>> On Fri, May 23, 2014 at 2:26 AM, András Serény
>>> <sereny.andras@gravityrd.com>wrote:
>>>
>>>   Hi Kafka users,
>>>> this feature would also be very useful for us. With lots of topics of
>>>> different volume (and as they grow in number) it could become tedious to
>>>> maintain topic level settings.
>>>>
>>>> As a start, I think uniform reduction is a good idea. Logs wouldn't be
>>>> retained as long as you want, but that's already the case when a
>>>> log.retention.bytes setting is specified. As for early rolling, I don't
>>>> think it's necessary: currently, if there is no log segment eligible for
>>>> deletion, log.retention.bytes and log.retention.hours settings won't kick
>>>> in, so it's possible to exceed these limits, which is completely fine
>>>> (please correct me if I'm mistaken here).
>>>>
>>>> All in all, introducing a global threshold doesn't seem to induce a
>>>> considerable change in current retention logic.
>>>>
>>>> Regards,
>>>> András
>>>>
>>>>
>>>> On 5/8/2014 2:00 AM, vinh wrote:
>>>>
>>>>   Agreed…a global knob is a bit tricky for exactly the reason you've
>>>>> identified.  Perhaps the problem could be simplified though by
>>>>> considering
>>>>> the context and purpose of Kafka.  I would use a persistent message
>>>>> queue
>>>>> because I want to guarantee that data/messages don't get lost.  But,
>>>>> since
>>>>> Kafka is not meant to be a long term storage solution (other products
>>>>> can
>>>>> be used for that), I would clarify that guarantee to apply only to the
>>>>> most
>>>>> recent messages up until a certain configured threshold (i.e. max 24
>>>>> hrs,
>>>>> max 500GB, etc).  Once those thresholds are reached, old messages are
>>>>> deleted first.
>>>>>
>>>>> To ensure no message loss (up to a limit), I must ensure Kafka is highly
>>>>> available.  There's a small a chance that the message deletion rate is
>>>>> the
>>>>> same rate that receive rate.  For example, when the incoming volume is
>>>>> so
>>>>> high that the size threshold is reached before the time threshold.
>>>>>   But, I
>>>>> may be ok with that because if Kafka goes down, it can cause upstream
>>>>> applications to fail.  This can result in higher losses overall, and
>>>>> particularly of the most *recent* messages.
>>>>>
>>>>> In other words, in a persistent but ephemeral message queue, I would
>>>>> give
>>>>> higher precedence to recent messages over older ones.  On the flip
>>>>> side, by
>>>>> allowing Kafka to go down when a disk is full, applications are forced
>>>>> to
>>>>> deal with the issue.  This adds complexity to apps, but perhaps it's
>>>>> not a
>>>>> bad thing.  After all, in scalability, all apps should be designed to
>>>>> handle failure.
>>>>>
>>>>> Having said that, next is to decide which messages to delete first. 
I
>>>>> believe that's a separate issue and has its own complexities, too.
>>>>>
>>>>> The main idea though is that a global knob would provide flexibility,
>>>>> even if not used.  From an operation perspective, if we can't ensure
HA
>>>>> for
>>>>> all applications/components, it would be good if we can for at least
>>>>> some
>>>>> of the core ones, like Kafka.  This is much easier said that done
>>>>> though.
>>>>>
>>>>>
>>>>> On May 5, 2014, at 9:16 AM, Jun Rao <junrao@gmail.com> wrote:
>>>>>
>>>>>    Yes, your understanding is correct. A global knob that controls
>>>>> aggregate
>>>>>
>>>>>> log size may make sense. What would be the expected behavior when
that
>>>>>> limit is reached? Would you reduce the retention uniformly across
all
>>>>>> topics? Then, it just means that some of the logs may not be retained
>>>>>> as
>>>>>> long as you want. Also, we need to think through what happens when
>>>>>> every
>>>>>> log has only 1 segment left and yet the total size still exceeds
the
>>>>>> limit.
>>>>>> Do we roll log segments early?
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Jun
>>>>>>
>>>>>>
>>>>>> On Sun, May 4, 2014 at 4:31 AM, vinh <vinh@loggly.com> wrote:
>>>>>>
>>>>>>    Thanks Jun.  So if I understand this correctly, there really is
no
>>>>>>
>>>>>>> master
>>>>>>> property to control the total aggregate size of all Kafka data
files
>>>>>>> on
>>>>>>> a
>>>>>>> broker.
>>>>>>>
>>>>>>> log.retention.size and log.file.size are great for managing data
at
>>>>>>> the
>>>>>>> application level.  In our case, application needs change frequently,
>>>>>>> and
>>>>>>> performance itself is an ever evolving feature.  This means various
>>>>>>> configs
>>>>>>> are constantly changing, like topics, # of partitions, etc.
>>>>>>>
>>>>>>> What rarely changes though is provisioned hardware resources.
 So a
>>>>>>> setting to control the total aggregate size of Kafka logs (or
>>>>>>> persisted
>>>>>>> data, for better clarity) would definitely simplify things at
an
>>>>>>> operational level, regardless what happens at the application
level.
>>>>>>>
>>>>>>>
>>>>>>> On May 2, 2014, at 7:49 AM, Jun Rao <junrao@gmail.com>
wrote:
>>>>>>>
>>>>>>>    log.retention.size controls the total size in a log dir (per
>>>>>>>
>>>>>>>> partition). log.file.size
>>>>>>>> controls the size of each log segment in the log dir.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Jun
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, May 1, 2014 at 9:31 PM, vinh <vinh@loggly.com>
wrote:
>>>>>>>>
>>>>>>>>    In the 0.7 docs, the description for log.retention.size
and
>>>>>>>> log.file.size
>>>>>>>> sound very much the same.  In particular, that they apply
to a single
>>>>>>>> log
>>>>>>>> file (or log segment file).
>>>>>>>>
>>>>>>>>> http://kafka.apache.org/07/configuration.html
>>>>>>>>>
>>>>>>>>> I'm beginning to think there is no setting to control
the max
>>>>>>>>> aggregate
>>>>>>>>> size of all logs.  If this is correct, what would be
a good approach
>>>>>>>>> to
>>>>>>>>> enforce this requirement?  In my particular scenario,
I have a lot
>>>>>>>>> of
>>>>>>>>>
>>>>>>>>>   data
>>>>>>>> being written to Kafka at a very high rate.  So a 1TB disk
can easily
>>>>>>>>
>>>>>>>>> be
>>>>>>>>> filled up in 24hrs or so.  One option is to add more
Kafka brokers
>>>>>>>>> to
>>>>>>>>>
>>>>>>>>>   add
>>>>>>>> more disk space to the pool, but I'd like to avoid that and
see if I
>>>>>>>>
>>>>>>>>> can
>>>>>>>>> simply configure Kafka to not write more than 1TB aggregate.
 Else,
>>>>>>>>>
>>>>>>>>>   Kafka
>>>>>>>> will OOM and kill itself, and possibly the crash the node
itself
>>>>>>>>
>>>>>>>>> because
>>>>>>>>> the disk is full.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On May 1, 2014, at 9:21 PM, vinh <vinh@loggly.com>
wrote:
>>>>>>>>>
>>>>>>>>>    Using Kafka 0.7.2, I have the following in server.properties:
>>>>>>>>>
>>>>>>>>>> log.retention.hours=48
>>>>>>>>>> log.retention.size=107374182400
>>>>>>>>>> log.file.size=536870912
>>>>>>>>>>
>>>>>>>>>> My interpretation of this is:
>>>>>>>>>> a) a single log segment file over 48hrs old will
be deleted
>>>>>>>>>> b) the total combined size of *all* logs is 100GB
>>>>>>>>>> c) a single log segment file is limited to 500MB
in size before a
>>>>>>>>>> new
>>>>>>>>>>
>>>>>>>>>>   segment file is spawned spawning a new segment
file
>>>>>>>>>   d) a "log file" can be composed of many "log segment
files"
>>>>>>>>>> But, even after setting the above, I find that the
total combined
>>>>>>>>>> size
>>>>>>>>>>
>>>>>>>>>>   of all Kafka logs on disk is 200GB right now. 
Isn't
>>>>>>>>> log.retention.size
>>>>>>>>> supposed to limit it to 100GB?  Am I missing something?
 The docs
>>>>>>>>> are
>>>>>>>>>
>>>>>>>>>   not
>>>>>>>> really clear, especially when it comes to distinguishing
between a
>>>>>>>> "log
>>>>>>>>
>>>>>>>>> file" and a "log segment file".
>>>>>>>>>
>>>>>>>>>   I have disk monitoring.  But like anything else in
software, even
>>>>>>>>>>   monitoring can fail.  Via configuration, I'd like
to make sure
>>>>>>>>> that
>>>>>>>>>
>>>>>>>>>   Kafka
>>>>>>>> does not write more than the available disk space.  Or something
like
>>>>>>>>
>>>>>>>>> log4j, where I can set a max number of log files and
the max size
>>>>>>>>> per
>>>>>>>>>
>>>>>>>>>   file,
>>>>>>>> which essentially allows me to set a max aggregate size limit
across
>>>>>>>>
>>>>>>>>> all
>>>>>>>>> logs.
>>>>>>>>>
>>>>>>>>>   Thanks,
>>>>>>>>>> -Vinh
>>>>>>>>>>
>>>>>>>>>>


Mime
View raw message