kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Inder Pall <inder.p...@gmail.com>
Subject Re: Significance of multiple segment files in a partition
Date Tue, 25 Oct 2011 15:14:54 GMT
guys,

so is it right to say that log retention property set to X days uses the
last activity on a segment file
to determine when to delete a file and if the file size is to set to a large
number and the same file keeps getting
appended on a daily basis then we won't achieve the 7 day cleanup till
either there isn't any activity  done for 7 days or
it has reached the bigger size and rolled over and stays there for 7 days.

on the other hand a smaller file size will ensure that it rolls over
multiple times in 7 days and the segments untouched in 7 days can be knocked
off
thus optimizing space usage.

are the default settings based on certain experimentation and recommended
for production use.

- Inder

On Tue, Oct 25, 2011 at 7:53 PM, Neha Narkhede <neha.narkhede@gmail.com>wrote:

> Inder,
>
> >> 2. Why would you want to have multiple files within a partition. Broker
> has
> >> to store more info to figure the right file among a partition.
>
> There is not much advantage apart from better accuracy with the
> getLatestOffeset API.
> Using that if you want to start consuming data close to a certain
> timestamp,
> you get better accuracy if you have smaller log files.
>
> >> 3. Is it to achieve mmap kinda optimization and allowing the broker to
> do
> >> less I/O in case a feed is really huge or any thing else.
>
> Not really. mmap is useful when you have random access on large files, or
> have multiple process trying to access the same file. It might actually not
> work well with large files if your memory is fragmented. Since we have
> sequential IO patterns, the filesystem caching itself works very well.
>
> Thanks,
> Neha
>
> On Tuesday, October 25, 2011, Jay Kreps <jay.kreps@gmail.com> wrote:
> > It is actually just to allow data deletion, we just delete whole segments
> in
> > the cleanup. There is not much value to tuning the file size for most
> > situations, but the tradeoff is that with smaller files you will have
> more
> > open files but be closer to your desired retention.hours and
> retention.size
> > settings.
> >
> > -Jay
> >
> > On Tue, Oct 25, 2011 at 1:59 AM, Inder Pall <inder.pall@gmail.com>
> wrote:
> >
> >> i am playing around with "log.file.size"(controls the size of a segment
> >> file
> >> in a partition) and "log.retention.hours" with the following config.
> >> log.file.size=500
> >> log.retention.hours=168
> >>
> >> Observation - i see multiple files getting generated within the same
> >> partition.
> >> Example : my topic name is revenue feed and i see the following
> >>
> >> ls -lh /tmp/kafka-logs/revenuefeed-0/*
> >> -rw-r--r-- 1 inder users 537 Oct 25 01:38
> >> /tmp/kafka-logs/revenuefeed-0/00000000000000000000.kafka
> >> -rw-r--r-- 1 inder users 512 Oct 25 01:39
> >> /tmp/kafka-logs/revenuefeed-0/00000000000000000537.kafka
> >>
> >> Questions
> >> --------------
> >> 1. Shouldn't these two properties go hand in hand
> >> 2. Why would you want to have multiple files within a partition. Broker
> has
> >> to store more info to figure the right file among a partition.
> >> 3. Is it to achieve mmap kinda optimization and allowing the broker to
> do
> >> less I/O in case a feed is really huge or any thing else.
> >>
> >> -- Inder
> >>
> >
>



-- 
-- Inder

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message