kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jiangjie Qin <j...@linkedin.com.INVALID>
Subject Re: How to prevent custom Partitioner from increasing the number of producer's requests?
Date Wed, 03 Jun 2015 18:04:18 GMT

Are you using new producer or old producer?
The old producer has 10 min sticky partition behavior while the new
producer does not. 

Thanks,

Jiangjie (Becket) Qin

On 6/2/15, 11:58 PM, "Sebastien Falquier" <sebastien.falquier@teads.tv>
wrote:

>Hi Jason,
>
>The default partitioner does not make the job since my producers haven't a
>smooth traffic. What I mean is that they can deliver lots of messages
>during 10 minutes and less during the next 10 minutes, that is too say the
>first partition will have stacked most of the messages of the last 20
>minutes.
>
>By the way, I don't understand your point about breaking batch into 2
>separate partitions. With that code, I jump to a new partition on message
>201, 401, 601, ... with batch size = 200, where is my mistake?
>
>Thanks for your help,
>S├ębastien
>
>2015-06-02 16:55 GMT+02:00 Jason Rosenberg <jbr@squareup.com>:
>
>> Hi Sebastien,
>>
>> You might just try using the default partitioner (which is random).  It
>> works by choosing a random partition each time it re-polls the meta-data
>> for the topic.  By default, this happens every 10 minutes for each topic
>> you produce to (so it evenly distributes load at a granularity of 10
>> minutes).  This is based on 'topic.metadata.refresh.interval.ms'.
>>
>> I suspect your code is causing double requests for each batch, if your
>> partitioning is actually breaking up your batches into 2 separate
>> partitions.  Could be an off by 1 error, with your modulo calculation?
>> Perhaps you need to use '% 0' instead of '% 1' there?
>>
>> Jason
>>
>>
>>
>> On Tue, Jun 2, 2015 at 3:35 AM, Sebastien Falquier <
>> sebastien.falquier@teads.tv> wrote:
>>
>> > Hi guys,
>> >
>> > I am new to Kafka and I am facing a problem I am not able to sort out.
>> >
>> > To smooth traffic over all my brokers' partitions, I have coded a
>>custom
>> > Paritioner for my producers, using a simple round robin algorithm that
>> > jumps from a partition to another on every batch of messages
>> (corresponding
>> > to batch.num.messages value). It looks like that :
>> > https://gist.github.com/sfalquier/4c0c7f36dd96d642b416
>> >
>> > With that fix, every partitions are used equally, but the amount of
>> > requests from the producers to the brokers have been multiplied by 2.
>>I
>> do
>> > not understand since all producers are async with
>>batch.num.messages=200
>> > and the amount of messages processed is still the same as before. Why
>>do
>> > producers need more requests to do the job? As internal traffic is a
>>bit
>> > critical on our platform, I would really like to reduce producers'
>> requests
>> > volume if possible.
>> >
>> > Any idea? Any suggestion?
>> >
>> > Regards,
>> > S├ębastien
>> >
>>


Mime
View raw message