kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sergi Vladykin <sergi.vlady...@gmail.com>
Subject Re: Transactional Producer
Date Wed, 27 Nov 2019 17:16:08 GMT
Hi!

I think we need to step back a little bit and understand what is what you
> are trying to achieve, please, will be beneficial to give you an accurate
> answer.
>

Sure, I'm working on my pet project that is a simple key-value database
replicated over Kafka.
I already implemented simple atomic updates like putIfAbsent, but now I
want to support transactional updates for multiple keys.
Thus, I'm trying to understand limitations of Kafka transactions and how to
correctly apply them to the task.


> What order can I expect for these published messages?
> - This depends on different factors, like linger, batch size, buffers, etc,
> even the network latency.
>

Obviously, I'm not too much interested in cases when the batch size is huge
and linger is huge as well and everything is batched together and
transactions are published one after another.
I'm looking into extreme case when there were no batching at all and
records were sent one after another in the described order, so that we
heave interleaving of records between the transactions.
Since producer API allows us to get the published offset of the record
before committing, it makes me think that this interleaving must be
possible.


> We should get all the records in the order of their offsets, thus we will
> be able to consume A and will not be able
> to consume B until X is either committed or aborted?
> - This depends on the reading isolation level, partitions assigned to the
> consumer, ...
>

As I wrote in the original message, we are talking about a single
partition.
And obviously it must be a read_committed consumer, otherwise transactions
are useless.


> If you give a little bit more context about what you are trying to achieve,
> probably we can help you further.
>

Thanks a lot for your help!

BTW, here is the link to the project if you are interested:
https://github.com/svladykin/ReplicaMap

Sergi


>
> Cheers!
> --
> Jonathan
>
> On Mon, Nov 25, 2019 at 6:51 AM Sergi Vladykin <sergi.vladykin@gmail.com>
> wrote:
>
> > Thanks a lot for your help!
> >
> > Another question about ordering and visibility.
> >
> > Lets say we have two transactional producers with different transactional
> > ids. They both publish records to the same partition like this:
> >
> > *thread1: startTx*
> > *thread1: record A*
> > *thread2: startTx*
> > *thread2: record X*
> > *thread1: record B*
> > *thread1: commit*
> > *thread2: record Y*
> > *thread2: commit*
> >
> > What order can I expect for these published messages?
> > Looks like it should be possible to get interleaved AXBY order (if the
> > records were not batched together).
> > But what if thread2 hangs for a long time right before the commit and
> > thread1 successfully commits?
> > We should get all the records in the order of their offsets, thus we will
> > be able to consume A and will not be able
> > to consume B until X is either committed or aborted?
> > Is my understanding right?
> >
> > The same will happen when we have one transactional and one
> > non-transactional producer publishing to the same partition?
> >
> > Sergi
> >
> > вс, 24 нояб. 2019 г. в 21:12, Jonathan Santilli <
> > jonathansantilli@gmail.com
> > >:
> >
> > > Hello Sergi,
> > >
> > > 1. Is it OK to mix transactional and non-transactional approach with a
> > > single KafkaProducer instance?
> > > - This is not possible, a transactional producer can not send data
> > outside
> > > a transaction.
> > >
> > > I mean sometimes I want to publish multiple messages transactionally,
> but
> > > oftentimes just a single message.
> > > Starting a transaction for publishing a single message looks
> inefficient.
> > > What is the recommend approach here?
> > > - Try to batch the records, if possible, otherwise, you need to begging
> > and
> > > commit the transaction, even for a single record.
> > >
> > > 2. If I publish multiple messages to multiple partitions in a single
> > > transaction is it guaranteed to be all or nothing published?
> > > - Yes, this is the power of the transactions, all or nothing.
> > >
> > > Is it possible to end up with only half of the messages published to
> half
> > > of partitions in some failure scenario?
> > > - No, this is not possible if you are using correctly a transaction.
> > >
> > > Please, take a look at this simple gist with diff scenarios of a
> > > KafkaProducer, hope this help:
> > >
> >
> https://gist.github.com/jonathansantilli/3b69ebbcd24e7a30f66db790ef648f99
> > >
> > >
> > > Cheers!
> > > --
> > > Jonathan
> > >
> > >
> > >
> > > On Sat, Nov 23, 2019 at 8:33 PM Sergi Vladykin <
> sergi.vladykin@gmail.com
> > >
> > > wrote:
> > >
> > > > Hi!
> > > >
> > > > I have two questions related to transactional producers:
> > > >
> > > > 1. Is it OK to mix transactional and non-transactional approach with
> a
> > > > single KafkaProducer instance? I mean sometimes I want to publish
> > > multiple
> > > > messages transactionally, but oftentimes just a single message.
> > Starting
> > > a
> > > > transaction for publishing a single message looks inefficient. What
> is
> > > the
> > > > recommend approach here?
> > > >
> > > > 2. If I publish multiple messages to multiple partitions in a single
> > > > transaction is it guaranteed to be all or nothing published? Is it
> > > possible
> > > > to end up with only half of the messages published to half of
> > partitions
> > > in
> > > > some failure scenario?
> > > >
> > > > Sergi
> > > >
> > >
> > >
> > > --
> > > Santilli Jonathan
> > >
> >
>
>
> --
> Santilli Jonathan
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message