kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Willy Hoang ...@knewton.com>
Subject Per-topic retention.bytes uses kilobytes not bytes?
Date Thu, 02 Apr 2015 21:27:42 GMT
Hello,

I’ve been having trouble using the retention.bytes per-topic configuration (using Kafka
version 0.8.2.1). I had the same issue that users described in these two threads where logs
were growing to sizes larger than retention.bytes. I couldn’t find an explanation to explain
the issue in either thread.
http://search-hadoop.com/m/4TaT4Y2YRD1 <http://search-hadoop.com/m/4TaT4Y2YRD1>
http://search-hadoop.com/m/4TaT4A94w9 <http://search-hadoop.com/m/4TaT4A94w9>

After a bit of exploring I came up with a hypothesis: retention.bytes uses kilobytes, not
bytes, as its unit of measurement. 

Below are reproduceable steps to support my findings.

# Create a new topic with retention.bytes = 1 and segment.bytes = 1024
./kafka-topics.sh --create --zookeeper `kafka-zookeeper` --replication-factor 2 --partitions
1 --topic test-topic-wh --config retention.ms=604800000 --config retention.bytes=1 --config
segment.bytes=1024

# Produce a message that will add 1024 bytes to the log (26 bytes of metadata and 998 bytes
from the message string)
# [2015-04-01 21:31:30,192]
./kafka-console-producer.sh --broker-list localhost:9092 --topic test-topic-wh
48511592621585064912153832133745068851354167277338568723801212367882940512382099547077656452011868167062280671787644034983697360468153738320733530248963074919916340211639682996497736197584019505594305204918092844365775522508769053709992262705578058943319678767004341493111503613353102924979561571028366773343124814043716584730147544725607450538227253470831289390680687225547253363513232291750196998204510607040879259384601451167183178896571219320889861706587525006032028098059014382213355803535550612056296013517434057006192416475524344248518557786455850822677869343421138195772284656076117000648020242375211903419500185954902765027000903916410762342630905680728543902271883661840640596483915010329616341194914110460126269112972976548329834183117816884560790416331259138123086341037733285781009676617847368669318437423457236162645890525200414080351181649588421908379380799396957194784506503965311272014255330651454364327607848972940341663812345678085583832958639819357061848511592621585064912153832

ls -r -l /mnt*/spool/kafka/test-topic-wh*
total 4
-rw-r--r-- 1 isaak isaak     1024 Apr  1 21:31 00000000000000000018.log
-rw-r--r-- 1 isaak isaak 10485760 Apr  1 21:27 00000000000000000018.index

# Wait abut 10 minutes (longer than the 5 minute retention check interval)
# Note that no changes occured

# Produce any sized message to exceed the 1024 bytes (1 KB) retention limit
# [2015-04-01 21:40:04,851]
./kafka-console-producer.sh --broker-list localhost:9092 --topic test-topic-wh

ls -r -l /mnt*/spool/kafka/test-topic-wh*
total 8
-rw-r--r-- 1 isaak isaak       26 Apr  1 21:40 00000000000000000020.log
-rw-r--r-- 1 isaak isaak 10485760 Apr  1 21:40 00000000000000000020.index
-rw-r--r-- 1 isaak isaak     1024 Apr  1 21:31 00000000000000000018.log
-rw-r--r-- 1 isaak isaak        0 Apr  1 21:40 00000000000000000018.index

# Note from /var/log/kafka/server.log that the older segment is deleted now that we have exceeded
the retention.bytes limit
[2015-04-01 21:40:10,114] INFO Rolled new log segment for 'test-topic-wh-0' in 0 ms. (kafka.log.Log)
[2015-04-01 21:42:16,214] INFO Scheduling log segment 18 for log test-topic-wh-0 for deletion.
(kafka.log.Log)
[2015-04-01 21:43:16,217] INFO Deleting segment 18 from log test-topic-wh-0. (kafka.log.Log)
[2015-04-01 21:43:16,217] INFO Deleting index /mnt/spool/kafka/test-topic-wh-0/00000000000000000018.index.deleted
(kafka.log.OffsetIndex)

ls -r -l /mnt*/spool/kafka/test-topic-wh*
total 4
-rw-r--r-- 1 isaak isaak       26 Apr  1 21:40 00000000000000000020.log
-rw-r--r-- 1 isaak isaak 10485760 Apr  1 21:40 00000000000000000020.index

I did a similar experiment with segment.bytes=2 and the results were consistent. I was wondering
if anyone else has discovered the same thing?

Regards,
Willy
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message