kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Manns <benma...@gmail.com>
Subject Re: reprocessing events from multiple topics
Date Wed, 04 May 2016 03:53:04 GMT
Both of your ideas are doable. Another thing to keep in mind is that
depending on your data source, late arriving data will not be sorted in
front of the already committed events. You may need some windowing buffer
to recalculate for stragglers.

For the multiple-topic approach, check out Samza's MessageChooser
https://wiki.apache.org/samza/Pluggable%20MessageChooser - other stream
processors may have something similar.

On Tuesday, May 3, 2016, Kyle Mathews <mathews.kyle@gmail.com> wrote:

> Hi Kafka Users,
> I'm thinking through how to convert my application to use Kafka. I use an
> event sourcing model and something I do frequently is reprocess old events
> when I change a model schema or update my processing code.
> In my current setup, I have few enough events that I can just load all the
> event types that feed into a model and sort them all and then reprocess
> them. There's starting to be enough events though now that loading/sorting
> events in memory is getting slow and sometimes causing OOM crashes.
> So one very attractive thing about Kafka is that all events are sorted so
> in theory, I just need to set a consumer's offset to 0 and things will just
> work™. But I've read that each event should have its own topic which raises
> the question how do I reprocess a model that's pulling from multiple topics
> while maintaining the order of events across multiple topics.
> So for the User model, say I have two events, userCreated and userUpdated
> each with a timestamp and an entity_id pointing to the user. If I'm
> reprocessing these, is there a normal pattern for how to pull events in
> order from multiple topics?
> One solution I've thought of is for producers to publish events to both
> event-specific topics as well as model topics e.g. userCreated would get
> published to the "userCreated" topic as well as the "user" topic.
> Another is that the stream processor for User, when reprocessing, would
> just look at the next event from each topic it's pulling from and always
> pull the oldest one next. Slightly tricky code but doable.
> Thoughts?

Benjamin Manns
(434) 321-8324

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message