kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martin Kleppmann <mkleppm...@linkedin.com>
Subject Re: Are offsets unique, immutable identifiers for a message in a topic?
Date Fri, 07 Mar 2014 21:32:50 GMT
On 7 Mar 2014, at 14:11, "Maier, Dr. Andreas" <andreas.maier@asideas.de> wrote:
>> In your case, it sounds like time-based retention with a fairly long
>> retention period is the way to go. You could potentially store the
>> offsets of messages to retry in a separate Kafka topic.
> I was also thinking about doing that. However, what do I do, if I have
> again some errors when processing the offsets from that Kafka topic?
> Since I cannot delete the offsets of messages from the Kafka topic that
> have been processed successfully, I would have to create another
> Kafka topic to again store the remaining offsets and then maybe another
> one and then another on and so on.

You might be interested to have a look at what Samza does: http://samza.incubator.apache.org/learn/documentation/0.7.0/
-- it's a stream processing framework that builds on Kafka's features. It still processes
messages sequentially per partition, so it doesn't do the per-message retry that you describe,
but it does use a separate Kafka topic for checkpointing state and recovering from failure.
(It doesn't require a cascade of topics.)

> That seems awkward to me.
> Wouldn't it be better to simply have a mutable list of offsets, read from
> that list and if a message was successfully processed,
> remove the offset from the list. By that one could immediately see from
> the length of the list how many messages still needs to be processed.
> Since Kafka topics are append only they don't seem to be a good fit for
> this kind of logic.

Indeed. If you want per-message acknowledgement and redelivery, perhaps something like RabbitMQ
or ActiveMQ is a better fit for your use case. Kafka's design is optimised for very high-throughput
sequential processing of messages, whereas RabbitMQ is better for "job queue" use cases where
you want to retry individual messages out-of-order.


View raw message