kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roshan Naik <ros...@hortonworks.com>
Subject Re: New Producer API - batched sync mode support
Date Mon, 27 Apr 2015 20:53:36 GMT
The important guarantee that is needed for a client producer thread is
that it requires an indication of success/failure of the batch of events
it pushed. Essentially it needs to retry producer.send() on that same
batch in case of failure. My understanding is that flush will simply flush
data from all threads (correct me if I am wrong).

-roshan



On 4/27/15 1:36 PM, "Joel Koshy" <jjkoshy.w@gmail.com> wrote:

>This sounds like flush:
>https://cwiki.apache.org/confluence/display/KAFKA/KIP-8+-+Add+a+flush+meth
>od+to+the+producer+API
>
>which was recently implemented in trunk.
>
>Joel
>
>On Mon, Apr 27, 2015 at 08:19:40PM +0000, Roshan Naik wrote:
>> Been evaluating the perf of old and new Produce APIs for reliable high
>>volume streaming data movement. I do see one area of improvement that
>>the new API could use for synchronous clients.
>> 
>> AFAIKT, the new API does not support batched synchronous transfers. To
>>do synchronous send, one needs to do a future.get() after every
>>Producer.send(). I changed the new
>>o.a.k.clients.tools.ProducerPerformance tool to asses the perf of this
>>mode of operation. May not be surprising that it much slower than the
>>async mode... hard t push it beyond 4MB/s.
>> 
>> The 0.8.1 Scala based producer API supported a batched sync mode via
>>Producer.send( List<KeyedMessage> ) . My measurements show that it was
>>able to approach (and sometimes exceed) the old async speeds... 266MB/s
>> 
>> 
>> Supporting this batched sync mode is very critical for streaming
>>clients (such as flume for example) that need delivery guarantees.
>>Although it can be done with Async mode, it requires additional book
>>keeping as to which events are delivered and which ones are not. The
>>programming model becomes much simpler with the batched sync mode.
>>Client having to deal with one single future.get() helps performance
>>greatly too as I noted.
>> 
>> Wanted to propose adding this as an enhancement to the new Producer API.
>


Mime
View raw message