kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sebastien Falquier <sebastien.falqu...@teads.tv>
Subject Re: How to prevent custom Partitioner from increasing the number of producer's requests?
Date Thu, 04 Jun 2015 06:35:22 GMT
I am using this code (from "org.apache.kafka" % "kafka_2.10" % "0.8.2.0"),
no idea if it is the old producer or the new one....

import kafka.producer.Produced
import kafka.producer.ProducerConfig
val prodConfig : ProducerConfig = new ProducerConfig(properties)
val producer : Producer[Integer,String] = new
Producer[Integer,String](prodConfig)

How can I know which producer I am using? And what is the behavior of the
new producer?

Thanks,
Sébastien


2015-06-03 20:04 GMT+02:00 Jiangjie Qin <jqin@linkedin.com.invalid>:

>
> Are you using new producer or old producer?
> The old producer has 10 min sticky partition behavior while the new
> producer does not.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On 6/2/15, 11:58 PM, "Sebastien Falquier" <sebastien.falquier@teads.tv>
> wrote:
>
> >Hi Jason,
> >
> >The default partitioner does not make the job since my producers haven't a
> >smooth traffic. What I mean is that they can deliver lots of messages
> >during 10 minutes and less during the next 10 minutes, that is too say the
> >first partition will have stacked most of the messages of the last 20
> >minutes.
> >
> >By the way, I don't understand your point about breaking batch into 2
> >separate partitions. With that code, I jump to a new partition on message
> >201, 401, 601, ... with batch size = 200, where is my mistake?
> >
> >Thanks for your help,
> >Sébastien
> >
> >2015-06-02 16:55 GMT+02:00 Jason Rosenberg <jbr@squareup.com>:
> >
> >> Hi Sebastien,
> >>
> >> You might just try using the default partitioner (which is random).  It
> >> works by choosing a random partition each time it re-polls the meta-data
> >> for the topic.  By default, this happens every 10 minutes for each topic
> >> you produce to (so it evenly distributes load at a granularity of 10
> >> minutes).  This is based on 'topic.metadata.refresh.interval.ms'.
> >>
> >> I suspect your code is causing double requests for each batch, if your
> >> partitioning is actually breaking up your batches into 2 separate
> >> partitions.  Could be an off by 1 error, with your modulo calculation?
> >> Perhaps you need to use '% 0' instead of '% 1' there?
> >>
> >> Jason
> >>
> >>
> >>
> >> On Tue, Jun 2, 2015 at 3:35 AM, Sebastien Falquier <
> >> sebastien.falquier@teads.tv> wrote:
> >>
> >> > Hi guys,
> >> >
> >> > I am new to Kafka and I am facing a problem I am not able to sort out.
> >> >
> >> > To smooth traffic over all my brokers' partitions, I have coded a
> >>custom
> >> > Paritioner for my producers, using a simple round robin algorithm that
> >> > jumps from a partition to another on every batch of messages
> >> (corresponding
> >> > to batch.num.messages value). It looks like that :
> >> > https://gist.github.com/sfalquier/4c0c7f36dd96d642b416
> >> >
> >> > With that fix, every partitions are used equally, but the amount of
> >> > requests from the producers to the brokers have been multiplied by 2.
> >>I
> >> do
> >> > not understand since all producers are async with
> >>batch.num.messages=200
> >> > and the amount of messages processed is still the same as before. Why
> >>do
> >> > producers need more requests to do the job? As internal traffic is a
> >>bit
> >> > critical on our platform, I would really like to reduce producers'
> >> requests
> >> > volume if possible.
> >> >
> >> > Any idea? Any suggestion?
> >> >
> >> > Regards,
> >> > Sébastien
> >> >
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message