kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jay Kreps <...@confluent.io>
Subject Re: KafkaStreams StateStore as EventStore (Event Sourcing)
Date Fri, 16 Dec 2016 19:27:48 GMT
Good question! Here's my understanding.

The streams API has a config num.standby.replicas. If this value is set to
0, the default, then the local state will have to be recreated by
re-reading the relevant Kafka partition and replaying that into the state
store, and as you point out this will take time proportional to the amount
of data. If you set this value to something more than 0, then a "standby
task" will be kept on one of the other instances. This standby won't do any
processing it will just passively replicate the state changes of the
primary task; in the event of a failure this standby task will be able to
take over very quickly because it already has the full state pre-created.

So you have a choice of redundancy in either "time" (by replaying data) or
"space" (by storing multiple copies).

(Hopefully that's correct, I don't have the firmest grasp on how the
standby tasks work.)

-Jay

On Thu, Dec 15, 2016 at 6:10 PM, Anatoly Pulyaevskiy <
anatoly.pulyaevskiy@gmail.com> wrote:

> Hi everyone,
>
> I've been reading a lot about new features in Kafka Streams and everything
> looks very promising. There is even an article on Kafka and Event Sourcing:
> https://www.confluent.io/blog/event-sourcing-cqrs-stream-
> processing-apache-kafka-whats-connection/
>
> There are a couple of things that I'm concerned about though. For Event
> Sourcing it is assumed that there is a way to fetch all events for a
> particular object and replay them in order to get "latest snapshot" of that
> object.
>
> It seems like (and the article says so) that StateStore in KafkaStreams can
> be used to achieve that.
>
> My first question is would it scale well for millions of objects?
> I understand that StateStore is backed by a compacted Kafka topic so in an
> event of failure KafkaStreams will recover to the latest state by reading
> all messages from that topic. But my suspicion is that for millions of
> objects this may take a while (it would need to read the whole partition
> for each object), is this a correct assumption?
>
> My second question is would it make more sense to use an external DB in
> such case or is there a "best practice" around implementing Event Sourcing
> and using Kafka's internal StateStore as EventStore?
>
> Thanks,
> Anatoly
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message