kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kishore Senji <kse...@gmail.com>
Subject Re: 0.8.2 producer and single message requests
Date Wed, 02 Sep 2015 06:09:37 GMT
Yes, this will be a problem if you are providing batching for your REST
service on top of Kafka and have to acknowledge to your client only when
all the callbacks for individual sends are called.

Here is one implementation I have done:
https://github.com/ksenji/KafkaBatchProducer/blob/master/src/main/java/kafka/samples/KafkaBatchProducer.java
which
takes a list of messages to send, with one callback which gets called when
all the messages onCompletion() are called. Take a look at the
BatchCallbackHelper and also the sample Producer which uses this batch
producer.



On Tue, Sep 1, 2015 at 1:42 PM Gwen Shapira <gwen@confluent.io> wrote:

> We've seen a lot of requests for this, and I don't think there is any
> general objection.
>
> If you want to discuss concrete API suggestions, perhaps the dev mailing
> list is the right place for the discussion.
>
> Gwen
>
> On Tue, Sep 1, 2015 at 11:25 AM, Neelesh <neeleshs@gmail.com> wrote:
>
> > Here's what I think :
> > # The new producer generates Java futures  , we all know the problems
> with
> > java futures (cannot compose, blocking, does not work well with other JVM
> > languages /libraries - RxJava/RxScala etc)
> >
> > # or we can pass in a callback - works okay when we are dealing with
> single
> > messages but with a batch of messages pushes lot of book keeping on the
> > caller. The client now will have to deal with coordinating one callback
> per
> > producer.send(), or deal with a single stateful callback and handle
> > synchronization across all the state generated by callbacks. Granted, we
> > can simplify the model because we know there is a single i/o thread that
> > runs the callbacks, but then, we are relying on an implementation detail.
> > Does not feel very clean
> >
> > Overall, when I have to send a bunch of messages synchronously, the new
> > producer does not give me a good way to model it. It feels like the new
> > producer is more prescriptive.
> >
> > Now if the producer had just one more API that took a list of messages
> and
> > handed me back a callback for that list, things would've been much
> simpler.
> >
> >
> > On Mon, Aug 17, 2015 at 10:41 PM, Kishore Senji <ksenji@gmail.com>
> wrote:
> >
> > > If linger.ms is 0, batching does not add to the latency. It will
> > actually
> > > improve throughput without affecting latency. Enabling batching does
> not
> > > mean it will wait for the batch to be full. Whatever gets filled during
> > the
> > > previous batch send will be sent in the current batch even if it count
> is
> > > less than batch.size
> > >
> > > You do not have to work with Future. With callback you will get Async
> > model
> > > essentially (and you can make use of it if you webservice is using
> > Servlet
> > > 3.0)
> > >
> > >
> > > producer.send(record, new AsyncCallback(request, response));
> > >
> > >
> > > static final class AsyncCallback implements Callback {
> > >
> > >     HttpServletRequest request;
> > >     HttpServletResponse response;
> > >
> > > void onCompletion(RecordMetadata metadata, java.lang.Exception
> > exception) {
> > >
> > >   // Check exception and send appropriate response
> > >
> > > }
> > > }
> > >
> > > On Mon, Aug 17, 2015 at 10:49 AM Neelesh <neeleshs@gmail.com> wrote:
> > >
> > > > Thanks for the answers. Indeed, the callback model is the same
> > regardless
> > > > of batching. But for a synchronous web service, batching creates a
> > > latency
> > > > issue. linger.ms is by default set to zero. Also, java futures are
> > hard
> > > > to
> > > > work with compared to Scala futures.  The current API also returns
> one
> > > > future per single record send (correct me if I missed another
> variant)
> > > that
> > > > leaves the client code to deal with hundreds of futures and/or
> > callbacks.
> > > > May I'm missing something very obvious in the new API, but this model
> > and
> > > > the fact that the scala APIs are going away makes writing an
> ingestion
> > > > service in front of Kafka  more involved than the 0.8.1 API.
> > > >
> > > > On Sun, Aug 16, 2015 at 12:02 AM, Kishore Senji <ksenji@gmail.com>
> > > wrote:
> > > >
> > > > > Adding to what Gwen already mentioned -
> > > > >
> > > > > The programming model for the Producer is send() with an optional
> > > > callback
> > > > > and we get a Future. This model does not change whether behind the
> > > scenes
> > > > > batching is done or not. So your fault tolerance logic really
> should
> > > not
> > > > > depend on whether batching is done over the wire for performance
> > > reasons.
> > > > > So assuming that you will get better fault tolerance without
> batching
> > > is
> > > > > also not accurate, as you have to check you have any exception in
> the
> > > > > onCompletion()
> > > > >
> > > > > The webservice should have a callback registered (using which you
> > > > > essentially get async model) for every send() and based on that it
> > > should
> > > > > respond to its clients whether the call is successful or not. The
> > > clients
> > > > > of your webservice should have fault tolerance built on top of your
> > > > > response codes.
> > > > >
> > > > > I think batching is a good thing as you get better throughput plus
> if
> > > you
> > > > > do not have linger.ms set, it does not wait until it completely
> > > reaches
> > > > > the
> > > > > batch.size so all the concurrent requests to your webservice will
> get
> > > > > batched and sent to the broker which will increase the throughput
> of
> > > the
> > > > > Producer and in turn your webservice.
> > > > >
> > > > > On Fri, Aug 14, 2015 at 6:10 PM Gwen Shapira <gwen@confluent.io>
> > > wrote:
> > > > >
> > > > > > Hi Neelesh :)
> > > > > >
> > > > > > The new producer has configuration for controlling the batch
> sizes.
> > > > > > By default, it will batch as much as possible without delay
> > > (controlled
> > > > > by
> > > > > > linger.ms) and without using too much memory (controlled by
> > > > batch.size).
> > > > > >
> > > > > > As mentioned in the docs, you can set batch.size to 0 to disable
> > > > batching
> > > > > > completely if you want.
> > > > > >
> > > > > > It is worthwhile to consider using the producer callback to
avoid
> > > > losing
> > > > > > messages when the webservice crashes (for example have the
> > webservice
> > > > > only
> > > > > > consider messages as sent if the callback is triggered for a
> > > successful
> > > > > > send).
> > > > > >
> > > > > > You can read more information on batching here:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://ingest.tips/2015/07/19/tips-for-improving-performance-of-kafka-producer/
> > > > > >
> > > > > > And some examples on how to produce data to Kafka with the new
> > > > producer -
> > > > > > both with futures and callbacks here:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/gwenshap/kafka-examples/blob/master/SimpleCounter/src/main/java/com/shapira/examples/producer/simplecounter/DemoProducerNewJava.java
> > > > > >
> > > > > > Gwen
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Fri, Aug 14, 2015 at 5:07 PM, Neelesh <neeleshs@gmail.com>
> > wrote:
> > > > > >
> > > > > > > We are fronting all our Kafka requests with a simple web
> service
> > > (we
> > > > do
> > > > > > > some additional massaging and writing to other stores as
well).
> > The
> > > > new
> > > > > > > KafkaProducer in 0.8.2 seems very geared towards producer
> > batching.
> > > > > Most
> > > > > > of
> > > > > > > our payload are single messages.
> > > > > > >
> > > > > > > Producer batching basically sets us up for lost messages
if our
> > web
> > > > > > service
> > > > > > > goes down with unflushed messaged in the producer.
> > > > > > >
> > > > > > > Another issue is when we have a batch of records. It looks
> like I
> > > > have
> > > > > to
> > > > > > > call producer.send for each record and deal with individual
> > futures
> > > > > > > returned.
> > > > > > >
> > > > > > > Are there any patterns for primarily single message requests,
> > > without
> > > > > > > losing data? I understand the throughput will be low.
> > > > > > >
> > > > > > > Thanks!
> > > > > > > -Neelesh
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message