kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sebastien Falquier <sebastien.falqu...@teads.tv>
Subject Re: How to prevent custom Partitioner from increasing the number of producer's requests?
Date Thu, 11 Jun 2015 14:50:20 GMT
Thanks Jiangjie. The new producer is exactly what I was looking for and it
works perfectly in production. It should be more documented on the official
site.

Jason : you're right, I missed a point with my AtomicIntegers.

Regards,
Sebastien

2015-06-04 20:02 GMT+02:00 Jason Rosenberg <jbr@squareup.com>:

> Sebastien, I think you may have an off by 1 error (e.g. batch should be
> 0-199, not 1-200).  Thus you are sending 2 batches each time (one for 0,
> another for 1-199).
>
> Jason
>
> On Thu, Jun 4, 2015 at 1:33 PM, Jiangjie Qin <jqin@linkedin.com.invalid>
> wrote:
>
> > From the code you pasted, that is old producer.
> > The new producer class is
> org.apache.kafka.clients.producer.KafkaProducer.
> >
> > The new producer does not have sticky partition behavior. The default
> > partitioner use round-robin like way to send non-keyed messages to
> > partitions.
> >
> > Jiangjie (Becket) Qin
> >
> > On 6/3/15, 11:35 PM, "Sebastien Falquier" <sebastien.falquier@teads.tv>
> > wrote:
> >
> > >I am using this code (from "org.apache.kafka" % "kafka_2.10" %
> "0.8.2.0"),
> > >no idea if it is the old producer or the new one....
> > >
> > >import kafka.producer.Produced
> > >import kafka.producer.ProducerConfig
> > >val prodConfig : ProducerConfig = new ProducerConfig(properties)
> > >val producer : Producer[Integer,String] = new
> > >Producer[Integer,String](prodConfig)
> > >
> > >How can I know which producer I am using? And what is the behavior of
> the
> > >new producer?
> > >
> > >Thanks,
> > >Sébastien
> > >
> > >
> > >2015-06-03 20:04 GMT+02:00 Jiangjie Qin <jqin@linkedin.com.invalid>:
> > >
> > >>
> > >> Are you using new producer or old producer?
> > >> The old producer has 10 min sticky partition behavior while the new
> > >> producer does not.
> > >>
> > >> Thanks,
> > >>
> > >> Jiangjie (Becket) Qin
> > >>
> > >> On 6/2/15, 11:58 PM, "Sebastien Falquier" <
> sebastien.falquier@teads.tv>
> > >> wrote:
> > >>
> > >> >Hi Jason,
> > >> >
> > >> >The default partitioner does not make the job since my producers
> > >>haven't a
> > >> >smooth traffic. What I mean is that they can deliver lots of messages
> > >> >during 10 minutes and less during the next 10 minutes, that is too
> say
> > >>the
> > >> >first partition will have stacked most of the messages of the last
20
> > >> >minutes.
> > >> >
> > >> >By the way, I don't understand your point about breaking batch into
2
> > >> >separate partitions. With that code, I jump to a new partition on
> > >>message
> > >> >201, 401, 601, ... with batch size = 200, where is my mistake?
> > >> >
> > >> >Thanks for your help,
> > >> >Sébastien
> > >> >
> > >> >2015-06-02 16:55 GMT+02:00 Jason Rosenberg <jbr@squareup.com>:
> > >> >
> > >> >> Hi Sebastien,
> > >> >>
> > >> >> You might just try using the default partitioner (which is random).
> > >>It
> > >> >> works by choosing a random partition each time it re-polls the
> > >>meta-data
> > >> >> for the topic.  By default, this happens every 10 minutes for
each
> > >>topic
> > >> >> you produce to (so it evenly distributes load at a granularity
of
> 10
> > >> >> minutes).  This is based on 'topic.metadata.refresh.interval.ms'.
> > >> >>
> > >> >> I suspect your code is causing double requests for each batch,
if
> > >>your
> > >> >> partitioning is actually breaking up your batches into 2 separate
> > >> >> partitions.  Could be an off by 1 error, with your modulo
> > >>calculation?
> > >> >> Perhaps you need to use '% 0' instead of '% 1' there?
> > >> >>
> > >> >> Jason
> > >> >>
> > >> >>
> > >> >>
> > >> >> On Tue, Jun 2, 2015 at 3:35 AM, Sebastien Falquier <
> > >> >> sebastien.falquier@teads.tv> wrote:
> > >> >>
> > >> >> > Hi guys,
> > >> >> >
> > >> >> > I am new to Kafka and I am facing a problem I am not able
to sort
> > >>out.
> > >> >> >
> > >> >> > To smooth traffic over all my brokers' partitions, I have
coded a
> > >> >>custom
> > >> >> > Paritioner for my producers, using a simple round robin algorithm
> > >>that
> > >> >> > jumps from a partition to another on every batch of messages
> > >> >> (corresponding
> > >> >> > to batch.num.messages value). It looks like that :
> > >> >> > https://gist.github.com/sfalquier/4c0c7f36dd96d642b416
> > >> >> >
> > >> >> > With that fix, every partitions are used equally, but the
amount
> of
> > >> >> > requests from the producers to the brokers have been multiplied
> by
> > >>2.
> > >> >>I
> > >> >> do
> > >> >> > not understand since all producers are async with
> > >> >>batch.num.messages=200
> > >> >> > and the amount of messages processed is still the same as
before.
> > >>Why
> > >> >>do
> > >> >> > producers need more requests to do the job? As internal traffic
> is
> > >>a
> > >> >>bit
> > >> >> > critical on our platform, I would really like to reduce
> producers'
> > >> >> requests
> > >> >> > volume if possible.
> > >> >> >
> > >> >> > Any idea? Any suggestion?
> > >> >> >
> > >> >> > Regards,
> > >> >> > Sébastien
> > >> >> >
> > >> >>
> > >>
> > >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message