kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Neelesh <neele...@gmail.com>
Subject Re: 0.8.2 producer and single message requests
Date Tue, 01 Sep 2015 18:25:23 GMT
Here's what I think :
# The new producer generates Java futures  , we all know the problems with
java futures (cannot compose, blocking, does not work well with other JVM
languages /libraries - RxJava/RxScala etc)

# or we can pass in a callback - works okay when we are dealing with single
messages but with a batch of messages pushes lot of book keeping on the
caller. The client now will have to deal with coordinating one callback per
producer.send(), or deal with a single stateful callback and handle
synchronization across all the state generated by callbacks. Granted, we
can simplify the model because we know there is a single i/o thread that
runs the callbacks, but then, we are relying on an implementation detail.
Does not feel very clean

Overall, when I have to send a bunch of messages synchronously, the new
producer does not give me a good way to model it. It feels like the new
producer is more prescriptive.

Now if the producer had just one more API that took a list of messages and
handed me back a callback for that list, things would've been much simpler.


On Mon, Aug 17, 2015 at 10:41 PM, Kishore Senji <ksenji@gmail.com> wrote:

> If linger.ms is 0, batching does not add to the latency. It will actually
> improve throughput without affecting latency. Enabling batching does not
> mean it will wait for the batch to be full. Whatever gets filled during the
> previous batch send will be sent in the current batch even if it count is
> less than batch.size
>
> You do not have to work with Future. With callback you will get Async model
> essentially (and you can make use of it if you webservice is using Servlet
> 3.0)
>
>
> producer.send(record, new AsyncCallback(request, response));
>
>
> static final class AsyncCallback implements Callback {
>
>     HttpServletRequest request;
>     HttpServletResponse response;
>
> void onCompletion(RecordMetadata metadata, java.lang.Exception exception) {
>
>   // Check exception and send appropriate response
>
> }
> }
>
> On Mon, Aug 17, 2015 at 10:49 AM Neelesh <neeleshs@gmail.com> wrote:
>
> > Thanks for the answers. Indeed, the callback model is the same regardless
> > of batching. But for a synchronous web service, batching creates a
> latency
> > issue. linger.ms is by default set to zero. Also, java futures are  hard
> > to
> > work with compared to Scala futures.  The current API also returns one
> > future per single record send (correct me if I missed another variant)
> that
> > leaves the client code to deal with hundreds of futures and/or callbacks.
> > May I'm missing something very obvious in the new API, but this model and
> > the fact that the scala APIs are going away makes writing an ingestion
> > service in front of Kafka  more involved than the 0.8.1 API.
> >
> > On Sun, Aug 16, 2015 at 12:02 AM, Kishore Senji <ksenji@gmail.com>
> wrote:
> >
> > > Adding to what Gwen already mentioned -
> > >
> > > The programming model for the Producer is send() with an optional
> > callback
> > > and we get a Future. This model does not change whether behind the
> scenes
> > > batching is done or not. So your fault tolerance logic really should
> not
> > > depend on whether batching is done over the wire for performance
> reasons.
> > > So assuming that you will get better fault tolerance without batching
> is
> > > also not accurate, as you have to check you have any exception in the
> > > onCompletion()
> > >
> > > The webservice should have a callback registered (using which you
> > > essentially get async model) for every send() and based on that it
> should
> > > respond to its clients whether the call is successful or not. The
> clients
> > > of your webservice should have fault tolerance built on top of your
> > > response codes.
> > >
> > > I think batching is a good thing as you get better throughput plus if
> you
> > > do not have linger.ms set, it does not wait until it completely
> reaches
> > > the
> > > batch.size so all the concurrent requests to your webservice will get
> > > batched and sent to the broker which will increase the throughput of
> the
> > > Producer and in turn your webservice.
> > >
> > > On Fri, Aug 14, 2015 at 6:10 PM Gwen Shapira <gwen@confluent.io>
> wrote:
> > >
> > > > Hi Neelesh :)
> > > >
> > > > The new producer has configuration for controlling the batch sizes.
> > > > By default, it will batch as much as possible without delay
> (controlled
> > > by
> > > > linger.ms) and without using too much memory (controlled by
> > batch.size).
> > > >
> > > > As mentioned in the docs, you can set batch.size to 0 to disable
> > batching
> > > > completely if you want.
> > > >
> > > > It is worthwhile to consider using the producer callback to avoid
> > losing
> > > > messages when the webservice crashes (for example have the webservice
> > > only
> > > > consider messages as sent if the callback is triggered for a
> successful
> > > > send).
> > > >
> > > > You can read more information on batching here:
> > > >
> > > >
> > >
> >
> http://ingest.tips/2015/07/19/tips-for-improving-performance-of-kafka-producer/
> > > >
> > > > And some examples on how to produce data to Kafka with the new
> > producer -
> > > > both with futures and callbacks here:
> > > >
> > > >
> > >
> >
> https://github.com/gwenshap/kafka-examples/blob/master/SimpleCounter/src/main/java/com/shapira/examples/producer/simplecounter/DemoProducerNewJava.java
> > > >
> > > > Gwen
> > > >
> > > >
> > > >
> > > > On Fri, Aug 14, 2015 at 5:07 PM, Neelesh <neeleshs@gmail.com> wrote:
> > > >
> > > > > We are fronting all our Kafka requests with a simple web service
> (we
> > do
> > > > > some additional massaging and writing to other stores as well). The
> > new
> > > > > KafkaProducer in 0.8.2 seems very geared towards producer batching.
> > > Most
> > > > of
> > > > > our payload are single messages.
> > > > >
> > > > > Producer batching basically sets us up for lost messages if our web
> > > > service
> > > > > goes down with unflushed messaged in the producer.
> > > > >
> > > > > Another issue is when we have a batch of records. It looks like I
> > have
> > > to
> > > > > call producer.send for each record and deal with individual futures
> > > > > returned.
> > > > >
> > > > > Are there any patterns for primarily single message requests,
> without
> > > > > losing data? I understand the throughput will be low.
> > > > >
> > > > > Thanks!
> > > > > -Neelesh
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message