kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sebastien Falquier <sebastien.falqu...@teads.tv>
Subject Re: How to prevent custom Partitioner from increasing the number of producer's requests?
Date Wed, 03 Jun 2015 06:58:20 GMT
Hi Jason,

The default partitioner does not make the job since my producers haven't a
smooth traffic. What I mean is that they can deliver lots of messages
during 10 minutes and less during the next 10 minutes, that is too say the
first partition will have stacked most of the messages of the last 20
minutes.

By the way, I don't understand your point about breaking batch into 2
separate partitions. With that code, I jump to a new partition on message
201, 401, 601, ... with batch size = 200, where is my mistake?

Thanks for your help,
S├ębastien

2015-06-02 16:55 GMT+02:00 Jason Rosenberg <jbr@squareup.com>:

> Hi Sebastien,
>
> You might just try using the default partitioner (which is random).  It
> works by choosing a random partition each time it re-polls the meta-data
> for the topic.  By default, this happens every 10 minutes for each topic
> you produce to (so it evenly distributes load at a granularity of 10
> minutes).  This is based on 'topic.metadata.refresh.interval.ms'.
>
> I suspect your code is causing double requests for each batch, if your
> partitioning is actually breaking up your batches into 2 separate
> partitions.  Could be an off by 1 error, with your modulo calculation?
> Perhaps you need to use '% 0' instead of '% 1' there?
>
> Jason
>
>
>
> On Tue, Jun 2, 2015 at 3:35 AM, Sebastien Falquier <
> sebastien.falquier@teads.tv> wrote:
>
> > Hi guys,
> >
> > I am new to Kafka and I am facing a problem I am not able to sort out.
> >
> > To smooth traffic over all my brokers' partitions, I have coded a custom
> > Paritioner for my producers, using a simple round robin algorithm that
> > jumps from a partition to another on every batch of messages
> (corresponding
> > to batch.num.messages value). It looks like that :
> > https://gist.github.com/sfalquier/4c0c7f36dd96d642b416
> >
> > With that fix, every partitions are used equally, but the amount of
> > requests from the producers to the brokers have been multiplied by 2. I
> do
> > not understand since all producers are async with batch.num.messages=200
> > and the amount of messages processed is still the same as before. Why do
> > producers need more requests to do the job? As internal traffic is a bit
> > critical on our platform, I would really like to reduce producers'
> requests
> > volume if possible.
> >
> > Any idea? Any suggestion?
> >
> > Regards,
> > S├ębastien
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message