kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Schierbeck <daniel.schierb...@gmail.com>
Subject Re: Kafka as an event store for Event Sourcing
Date Sat, 13 Jun 2015 16:59:06 GMT
@Jay:

Regarding your first proposal: wouldn't that mean that a producer wouldn't
know whether a write succeeded? In the case of event sourcing, a failed CAS
may require re-validating the input with the new state. Simply discarding
the write would be wrong.

As for the second idea: how would a client of the writer service know which
writer is the leader? For example, how would a load balancer know which web
app process to route requests to? Ideally, all processes would be able to
handle requests.

Using conditional writes would allow any producer to write and provide
synchronous feedback to the producers.
On fre. 12. jun. 2015 at 18.41 Jay Kreps <jay@confluent.io> wrote:

> I have been thinking a little about this. I don't think CAS actually
> requires any particular broker support. Rather the two writers just write
> messages with some deterministic check-and-set criteria and all the
> replicas read from the log and check this criteria before applying the
> write. This mechanism has the downside that it creates additional writes
> when there is a conflict and requires waiting on the full roundtrip (write
> and then read) but it has the advantage that it is very flexible as to the
> criteria you use.
>
> An alternative strategy for accomplishing the same thing a bit more
> efficiently is to elect leaders amongst the writers themselves. This would
> require broker support for single writer to avoid the possibility of split
> brain. I like this approach better because the leader for a partition can
> then do anything they want on their local data to make the decision of what
> is committed, however the downside is that the mechanism is more involved.
>
> -Jay
>
> On Fri, Jun 12, 2015 at 6:43 AM, Ben Kirwin <ben@kirw.in> wrote:
>
> > Gwen: Right now I'm just looking for feedback -- but yes, if folks are
> > interested, I do plan to do that implementation work.
> >
> > Daniel: Yes, that's exactly right. I haven't thought much about
> > per-key... it does sound useful, but the implementation seems a bit
> > more involved. Want to add it to the ticket?
> >
> > On Fri, Jun 12, 2015 at 7:49 AM, Daniel Schierbeck
> > <daniel.schierbeck@gmail.com> wrote:
> > > Ben: your solutions seems to focus on partition-wide CAS. Have you
> > > considered per-key CAS? That would make the feature more useful in my
> > > opinion, as you'd greatly reduce the contention.
> > >
> > > On Fri, Jun 12, 2015 at 6:54 AM Gwen Shapira <gshapira@cloudera.com>
> > wrote:
> > >
> > >> Hi Ben,
> > >>
> > >> Thanks for creating the ticket. Having check-and-set capability will
> be
> > >> sweet :)
> > >> Are you planning to implement this yourself? Or is it just an idea for
> > >> the community?
> > >>
> > >> Gwen
> > >>
> > >> On Thu, Jun 11, 2015 at 8:01 PM, Ben Kirwin <ben@kirw.in> wrote:
> > >> > As it happens, I submitted a ticket for this feature a couple days
> > ago:
> > >> >
> > >> > https://issues.apache.org/jira/browse/KAFKA-2260
> > >> >
> > >> > Couldn't find any existing proposals for similar things, but it's
> > >> > certainly possible they're out there...
> > >> >
> > >> > On the other hand, I think you can solve your particular issue by
> > >> > reframing the problem: treating the messages as 'requests' or
> > >> > 'commands' instead of statements of fact. In your flight-booking
> > >> > example, the log would correctly reflect that two different people
> > >> > tried to book the same flight; the stream consumer would be
> > >> > responsible for finalizing one booking, and notifying the other
> client
> > >> > that their request had failed. (In-browser or by email.)
> > >> >
> > >> > On Wed, Jun 10, 2015 at 5:04 AM, Daniel Schierbeck
> > >> > <daniel.schierbeck@gmail.com> wrote:
> > >> >> I've been working on an application which uses Event Sourcing,
and
> > I'd
> > >> like
> > >> >> to use Kafka as opposed to, say, a SQL database to store events.
> This
> > >> would
> > >> >> allow me to easily integrate other systems by having them read
off
> > the
> > >> >> Kafka topics.
> > >> >>
> > >> >> I do have one concern, though: the consistency of the data can
only
> > be
> > >> >> guaranteed if a command handler has a complete picture of all
past
> > >> events
> > >> >> pertaining to some entity.
> > >> >>
> > >> >> As an example, consider an airline seat reservation system. Each
> > >> >> reservation command issued by a user is rejected if the seat has
> > already
> > >> >> been taken. If the seat is available, a record describing the
event
> > is
> > >> >> appended to the log. This works great when there's only one
> producer,
> > >> but
> > >> >> in order to scale I may need multiple producer processes. This
> > >> introduces a
> > >> >> race condition: two command handlers may simultaneously receive
a
> > >> command
> > >> >> to reserver the same seat. The event log indicates that the seat
is
> > >> >> available, so each handler will append a reservation event –
thus
> > >> >> double-booking that seat!
> > >> >>
> > >> >> I see three ways around that issue:
> > >> >> 1. Don't use Kafka for this.
> > >> >> 2. Force a singler producer for a given flight. This will impact
> > >> >> availability and make routing more complex.
> > >> >> 3. Have a way to do optimistic locking in Kafka.
> > >> >>
> > >> >> The latter idea would work either on a per-key basis or globally
> for
> > a
> > >> >> partition: when appending to a partition, the producer would
> > indicate in
> > >> >> its request that the request should be rejected unless the current
> > >> offset
> > >> >> of the partition is equal to x. For the per-key setup, Kafka
> brokers
> > >> would
> > >> >> track the offset of the latest message for each unique key, if
so
> > >> >> configured. This would allow the request to specify that it should
> be
> > >> >> rejected if the offset for key k is not equal to x.
> > >> >>
> > >> >> This way, only one of the command handlers would succeed in writing
> > to
> > >> >> Kafka, thus ensuring consistency.
> > >> >>
> > >> >> There are different levels of complexity associated with
> implementing
> > >> this
> > >> >> in Kafka depending on whether the feature would work per-partition
> or
> > >> >> per-key:
> > >> >> * For the per-partition optimistic locking, the broker would just
> > need
> > >> to
> > >> >> keep track of the high water mark for each partition and reject
> > >> conditional
> > >> >> requests when the offset doesn't match.
> > >> >> * For per-key locking, the broker would need to maintain an
> in-memory
> > >> table
> > >> >> mapping keys to the offset of the last message with that key.
This
> > >> should
> > >> >> be fairly easy to maintain and recreate from the log if necessary.
> It
> > >> could
> > >> >> also be saved to disk as a snapshot from time to time in order
to
> cut
> > >> down
> > >> >> the time needed to recreate the table on restart. There's a small
> > >> >> performance penalty associated with this, but it could be opt-in
> for
> > a
> > >> >> topic.
> > >> >>
> > >> >> Am I the only one thinking about using Kafka like this? Would
this
> > be a
> > >> >> nice feature to have?
> > >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message