kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pulkit Manchanda <pulkit....@gmail.com>
Subject Re: Doubts in Kafka
Date Tue, 08 Jan 2019 16:28:11 GMT
Yes, As Todd said you have to use some id as the key to partition.
The rebalancing will be an over head and if you increase the partitions
later you will lose the order.

you can go through
https://anirudhbhatnagar.com/2016/08/22/achieving-order-guarnetee-in-kafka-with-partitioning/
for more understanding.

Pulkit

On Tue, Jan 8, 2019 at 11:23 AM Todd Palino <tpalino@gmail.com> wrote:

> OK, in that case you’ll want to do something like use the sensor ID as the
> key of the message. This will assure that every message for that sensor ID
> ends up in the same partition (which will assure strict ordering of
> messages for that sensor ID).
>
> Then you can create a number of partitions to get the parallelism you
> desire. For example, if you anticipate having no more than 1000 message
> processors, you would create 1000 partitions. In this way, each processor
> can consume messages from a single partition. In addition, you could work
> up to that point. You could have 10 processors to start with, and each
> would consume from 100 partitions. They would receive messages from each
> partition in order (for that partition), so you will assure serial
> processing of each sensor.
>
> Note that I wouldn’t create more than 1000 partitions or so for a single
> topic - it tends to give the rebalancing algorithms headaches and slow down
> consumer rebalances above that. Also, you want to set up the topics with
> the number of partitions once, and not expand the number of partitions
> later. When you expand partitions, the affinity of key to partition
> changes, so you may end up with out of order messages for a short period of
> time when you expand.
>
> -Todd
>
> On Tue, Jan 8, 2019 at 11:11 AM aruna ramachandran <arunaeienec@gmail.com>
> wrote:
>
> > I need to process single sensor messages in serial (order of messages
> > should not be changed)at the same time I have to process 10000 sensors
> > messages in parallel please help me to configure the topics and
> partitions.
> >
> > On Tue, Jan 8, 2019 at 9:19 PM Todd Palino <tpalino@gmail.com> wrote:
> >
> > > I think you’ll need to expand a little more here and explain what you
> > mean
> > > by processing them in parallel. Nearly by definition, parallelization
> and
> > > strict ordering are mutually exclusive concepts.
> > >
> > > -Todd
> > >
> > > On Tue, Jan 8, 2019 at 10:40 AM aruna ramachandran <
> > arunaeienec@gmail.com>
> > > wrote:
> > >
> > > > I need to process the 10000 sensor messages in parallel but each
> sensor
> > > > message should be in order.If I create 10000 partition it doesn't
> give
> > > high
> > > > throughput .Order is guaranteed only inside the partition. How can
> > > > parallelize messages without changing the order pls help me to find
> the
> > > > solution.
> > > >
> > >
> > >
> > > --
> > > *Todd Palino*
> > > Senior Staff Engineer, Site Reliability
> > > Data Infrastructure Streaming
> > >
> > >
> > >
> > > linkedin.com/in/toddpalino
> > >
> >
>
>
> --
> *Todd Palino*
> Senior Staff Engineer, Site Reliability
> Capacity Engineering
>
>
>
> linkedin.com/in/toddpalino
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message