kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Willy Hoang ...@knewton.com>
Subject Per-topic retention.bytes uses kilobytes not bytes?
Date Thu, 02 Apr 2015 21:27:42 GMT
Hello,

I’ve been having trouble using the retention.bytes per-topic configuration (using Kafka
version 0.8.2.1). I had the same issue that users described in these two threads where logs
were growing to sizes larger than retention.bytes. I couldn’t find an explanation to explain
the issue in either thread.
http://search-hadoop.com/m/4TaT4Y2YRD1 <http://search-hadoop.com/m/4TaT4Y2YRD1>
http://search-hadoop.com/m/4TaT4A94w9 <http://search-hadoop.com/m/4TaT4A94w9>

After a bit of exploring I came up with a hypothesis: retention.bytes uses kilobytes, not
bytes, as its unit of measurement. 

Below are reproduceable steps to support my findings.

# Create a new topic with retention.bytes = 1 and segment.bytes = 1024
./kafka-topics.sh --create --zookeeper `kafka-zookeeper` --replication-factor 2 --partitions
1 --topic test-topic-wh --config retention.ms=604800000 --config retention.bytes=1 --config
segment.bytes=1024

# Produce a message that will add 1024 bytes to the log (26 bytes of metadata and 998 bytes
from the message string)
# [2015-04-01 21:31:30,192]
./kafka-console-producer.sh --broker-list localhost:9092 --topic test-topic-wh


ls -r -l /mnt*/spool/kafka/test-topic-wh*
total 4
-rw-r--r-- 1 isaak isaak     1024 Apr  1 21:31 00000000000000000018.log
-rw-r--r-- 1 isaak isaak 10485760 Apr  1 21:27 00000000000000000018.index

# Wait abut 10 minutes (longer than the 5 minute retention check interval)
# Note that no changes occured

# Produce any sized message to exceed the 1024 bytes (1 KB) retention limit
# [2015-04-01 21:40:04,851]
./kafka-console-producer.sh --broker-list localhost:9092 --topic test-topic-wh

ls -r -l /mnt*/spool/kafka/test-topic-wh*
total 8
-rw-r--r-- 1 isaak isaak       26 Apr  1 21:40 00000000000000000020.log
-rw-r--r-- 1 isaak isaak 10485760 Apr  1 21:40 00000000000000000020.index
-rw-r--r-- 1 isaak isaak     1024 Apr  1 21:31 00000000000000000018.log
-rw-r--r-- 1 isaak isaak        0 Apr  1 21:40 00000000000000000018.index

# Note from /var/log/kafka/server.log that the older segment is deleted now that we have exceeded
the retention.bytes limit
[2015-04-01 21:40:10,114] INFO Rolled new log segment for 'test-topic-wh-0' in 0 ms. (kafka.log.Log)
[2015-04-01 21:42:16,214] INFO Scheduling log segment 18 for log test-topic-wh-0 for deletion.
(kafka.log.Log)
[2015-04-01 21:43:16,217] INFO Deleting segment 18 from log test-topic-wh-0. (kafka.log.Log)
[2015-04-01 21:43:16,217] INFO Deleting index /mnt/spool/kafka/test-topic-wh-0/00000000000000000018.index.deleted
(kafka.log.OffsetIndex)

ls -r -l /mnt*/spool/kafka/test-topic-wh*
total 4
-rw-r--r-- 1 isaak isaak       26 Apr  1 21:40 00000000000000000020.log
-rw-r--r-- 1 isaak isaak 10485760 Apr  1 21:40 00000000000000000020.index

I did a similar experiment with segment.bytes=2 and the results were consistent. I was wondering
if anyone else has discovered the same thing?

Regards,
Willy
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message