kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Neha Narkhede <neha.narkh...@gmail.com>
Subject Re: Significance of multiple segment files in a partition
Date Tue, 25 Oct 2011 14:23:46 GMT
Inder,

>> 2. Why would you want to have multiple files within a partition. Broker
has
>> to store more info to figure the right file among a partition.

There is not much advantage apart from better accuracy with the
getLatestOffeset API.
Using that if you want to start consuming data close to a certain timestamp,
you get better accuracy if you have smaller log files.

>> 3. Is it to achieve mmap kinda optimization and allowing the broker to do
>> less I/O in case a feed is really huge or any thing else.

Not really. mmap is useful when you have random access on large files, or
have multiple process trying to access the same file. It might actually not
work well with large files if your memory is fragmented. Since we have
sequential IO patterns, the filesystem caching itself works very well.

Thanks,
Neha

On Tuesday, October 25, 2011, Jay Kreps <jay.kreps@gmail.com> wrote:
> It is actually just to allow data deletion, we just delete whole segments
in
> the cleanup. There is not much value to tuning the file size for most
> situations, but the tradeoff is that with smaller files you will have more
> open files but be closer to your desired retention.hours and
retention.size
> settings.
>
> -Jay
>
> On Tue, Oct 25, 2011 at 1:59 AM, Inder Pall <inder.pall@gmail.com> wrote:
>
>> i am playing around with "log.file.size"(controls the size of a segment
>> file
>> in a partition) and "log.retention.hours" with the following config.
>> log.file.size=500
>> log.retention.hours=168
>>
>> Observation - i see multiple files getting generated within the same
>> partition.
>> Example : my topic name is revenue feed and i see the following
>>
>> ls -lh /tmp/kafka-logs/revenuefeed-0/*
>> -rw-r--r-- 1 inder users 537 Oct 25 01:38
>> /tmp/kafka-logs/revenuefeed-0/00000000000000000000.kafka
>> -rw-r--r-- 1 inder users 512 Oct 25 01:39
>> /tmp/kafka-logs/revenuefeed-0/00000000000000000537.kafka
>>
>> Questions
>> --------------
>> 1. Shouldn't these two properties go hand in hand
>> 2. Why would you want to have multiple files within a partition. Broker
has
>> to store more info to figure the right file among a partition.
>> 3. Is it to achieve mmap kinda optimization and allowing the broker to do
>> less I/O in case a feed is really huge or any thing else.
>>
>> -- Inder
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message