kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martin Kleppmann <mkleppm...@linkedin.com>
Subject Re: Are offsets unique, immutable identifiers for a message in a topic?
Date Fri, 07 Mar 2014 10:43:37 GMT
Almost right: offsets are unique, immutable identifiers for a message within a topic-partition.
Each partition has its own sequence of offsets, but a (topic, partition, offset) triple uniquely
and persistently identifies a particular message.

For log retention you have essentially two options: to discard messages older than some threshold
(which can be a few weeks if you have enough disk space, giving you plenty of time to recover
from failures), or to keep only the newest message for a given key and discard older messages
with the same key (http://kafka.apache.org/081/documentation.html#compaction). When using
time-based retention, when a segment of the log expires, offsets within that segment or before
become unavailable. When using compaction, "holes" appear in the sequence of offsets where
old messages were discarded.

In your case, it sounds like time-based retention with a fairly long retention period is the
way to go. You could potentially store the offsets of messages to retry in a separate Kafka


On 7 Mar 2014, at 09:38, "Maier, Dr. Andreas" <andreas.maier@asideas.de> wrote:

> Hi,
> I have the following problem:
> My Kafka consumer is consuming messages, but the processing of the message
> might fail. I do not want to
> retry until success, but instead want to quickly consume the next message.
> However at a later time I might still want to reprocess the failed
> messages.
> So I though about storing a list of offsets of the messages that have
> failed in the first try
> for later processing.
> But that would only make sense, if the offsets are unique, immutable
> identifiers for a message within a topic. Since Kafka deletes messages or
> compactifies the log after some time,
> I was wondering if this is really the case?
> If not, how could I then uniquely identify a message within a topic, so
> that a consumer knows from
> where to start consuming again?
> Thank you,
> Andreas Maier
> AS ideAS Engineering
> Axel-Springer-Straße 65
> 10888 Berlin
> Mobil: +49 (0) 151 ­ 730 26 414
> andreas.maier@asideas.de
> Axel Springer ideAS Engineering GmbH
> Ein Unternehmen der Axel Springer SE
> Sitz Berlin, Amtsgericht Charlottenburg, HRB 138466 B
> Geschäftsführer: Daniel Keller, Niels Matusch

View raw message