kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Rosenberg <...@squareup.com>
Subject Re: new producer api and batched Futures....
Date Fri, 21 Nov 2014 01:40:22 GMT
I guess it would make the api less clean, but I can imagine a sendBatch
method, which returns a single Future that gets triggered only when all
messages in the batch were finished.  The callback info could then contain
info about the success/exceptions encountered by each sub-group of
messages.  And the callback could even be called multiple times, once for
each sub-batch sent.   It gets complicated to think about it, but it would
be fewer Future objects created and less async contention/waiting, etc.

I'll try it out and see....

Jason

On Thu, Nov 20, 2014 at 7:56 PM, Jay Kreps <jay.kreps@gmail.com> wrote:

> Internally it works as you describe, there is only one CountDownLatch per
> batch sent, each of the futures is just a wrapper around that.
>
> It is true that if you accumulate thousands of futures in a list that may
> be a fair number of objects you are retaining, and there will be some work
> involved in checking them all. If you are sure they are all going to the
> same partition you can actually wait on the last future since sends are
> ordered within a partition. So when the final send completes the prior
> sends should also have completed.
>
> Either way if you see a case where the new producer isn't as fast as the
> old producer let us know.
>
> -Jay
>
>
>
> On Thu, Nov 20, 2014 at 4:24 PM, Jason Rosenberg <jbr@squareup.com> wrote:
>
> > I've been looking at the new producer api with anticipation, but have not
> > fired it up yet.
> >
> > One question I have, is it looks like there's no longer a 'batch' send
> mode
> > (and I get that this is all now handled internally, e.g. you send
> > individual messages, that then get collated and batched up and sent out).
> >
> > What I'm wondering, is whether there's added overhead in the producer
> (and
> > the client code) having to manage all the Future return Objects from all
> > the individual messages sent?  If I'm sending 100K messages/second, etc.,
> > that seems like a lot of async Future Objects that have to be tickled,
> and
> > waited for, etc.  Does not this cause some overhead?
> >
> > If I send a bunch of messages and then store all the Future's in a list,
> > and then wait for all of them, it seems like a lot of thread contention.
> > On the other hand, if I send a batch of messages, that are likely all to
> > get sent as a single batch over the wire (cuz they are all going to the
> > same partition), wouldn't there be some benefit in only having to wait
> for
> > a single Future Object for the batch?
> >
> > Jason
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message