kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tom Crayford <tcrayf...@heroku.com>
Subject Re: Increase number of topic in Kafka leads zookeeper fail
Date Tue, 17 May 2016 11:53:25 GMT
Hi,

On Monday, 16 May 2016, Abhaya P <abhayapat@gmail.com> wrote:

> I was reading a nice summary article by Jun Rao on the implications of # of
> topics/partitions.
>
> http://www.confluent.io/blog/how-to-choose-the-number-of-topicspartitions-in-a-kafka-cluster/
>
> There are many trade-offs to be considered, as it looks.
>
> Finding the partition for a key: Can 'custom partitioner' be employed so
> that a consumer can derive(compute) a partition id from a key and access it
> directly instead of scanning all the partitions in a topic?


Yep. Indeed, the Java producer has a partitioner that does just this.


>
> A thought:
> Using a partition in place of a topic per device could, optionally, provide
> one level of hierarchy with some manageability options in that a topic can
> be for a certain type of IOT or certain geography of IOT, or ... etc.


Using a partition per device will have the same issue most likely. Most of
Kafka's zookeeper storage that is likely causing issues is partition based.

Instead I'd recommend hashing keys per device across partitions. This is a
common use case, and common solution and has worked out well in production
for years for many companies. Partition or Topic per device is not a thing
that will work well, and definitely will never work on indefinite growth,
as zookeeper puts a max limit on the number of partitions in a Kafka
cluster.

Thanks

Tom Crayford
Heroku Kafka


>
> Thanks,
> Abhaya
>
>
>
> On Mon, May 16, 2016 at 5:30 AM, Paolo Patierno <ppatierno@live.com
> <javascript:;>> wrote:
>
> > I agree with Tom but ...
> >
> > ... to reply Christina question I guess that Anas thought about this kind
> > of solution in relation to the simplicity to read data from a specific
> > devices from a backend service point of view.
> >
> > Using one topic per device means that on the backend side you know
> exactly
> > which is the topic to read to get data from device X.
> > When you start using topic with more partitions and a key for device to
> > determine the partition destination, on the backend side you have to read
> > the entire topic (with data across different device) and get information
> > about the key to understand which is the device sender.
> >
> > I'm only guessing that it could be the reason ...
> >
> > Btw ... +1 what Tom said
> >
> > Paolo PatiernoSenior Software Engineer (IoT) @ Red Hat
> > Microsoft MVP on Windows Embedded & IoTMicrosoft Azure Advisor
> > Twitter : @ppatierno
> > Linkedin : paolopatierno
> > Blog : DevExperience
> >
> > > Date: Mon, 16 May 2016 05:26:31 -0700
> > > Subject: Re: Increase number of topic in Kafka leads zookeeper fail
> > > From: christian.posta@gmail.com <javascript:;>
> > > To: users@kafka.apache.org <javascript:;>
> > >
> > > +1 what Tom said.
> > >
> > > Curious though Anas, what motivated you to try a topic per device? was
> > > there something regarding management or security that you believe you
> can
> > > achieve with topic per device?
> > >
> > > On Mon, May 16, 2016 at 4:11 AM, Tom Crayford <tcrayford@heroku.com
> <javascript:;>>
> > wrote:
> > >
> > > > Hi there,
> > > >
> > > > Generally you don't use a single topic per device in this use case,
> > but one
> > > > topic with some number of partitions and the key distribution based
> on
> > > > device id. Kafka isn't designed for millions of low volume topics,
> but
> > a
> > > > few high volume ones.
> > > >
> > > > Thanks
> > > >
> > > > Tom Crayford
> > > > Heroku Kafka
> > > >
> > > > On Mon, May 16, 2016 at 5:23 AM, Anas A <anas.24aj@gmail.com
> <javascript:;>> wrote:
> > > >
> > > > > We plan to use kafka as a message broker for IoT use case, where
> each
> > > > > device is considered as unique topic. when I simulated 10 message
> per
> > > > > second to 10 thousand topics zookeeper is getting bottle neck,all
> > Kafka
> > > > > monitoring tools fails to read the throughput values and number of
> > topics
> > > > > from JMX port because of that. will tuning zookeeper will solve the
> > > > issues.
> > > > > where In IoT use case there will be millions of device polling data
> > to
> > > > > millions of topics. I want to make sure the approach is perfect to
> > go.
> > > > > Please suggest.
> > > > >
> > > > >
> > > > > *Thanks & Regards,*
> > > > >
> > > > >
> > > > > Anas A
> > > > > DBA, Trinity Mobility
> > > > > [image: facebook] <https://www.facebook.com/anas.24aj> [image:
> > twitter]
> > > > > <https://twitter.com/anas24aj> [image: linkedin]
> > > > > <http://in.linkedin.com/in/anas24aj> [image: googleplus]
> > > > > <https://plus.google.com/u/0/+anasA24aj/>
> > > > > +917736368236
> > > > > anas.24aj@gmail.com <javascript:;>
> > > > > Bangalore
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > *Christian Posta*
> > > twitter: @christianposta
> > > http://www.christianposta.com/blog
> > > http://fabric8.io
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message