kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ralph Caraveo <decka...@gmail.com>
Subject Upper-bound on number of consumers
Date Thu, 09 Apr 2015 02:35:53 GMT
Hello Kafka Friends,

We are considering a use-case where we'd like to have a Kafka Cluster with
potentially 1000's of partitions using a hashed key on customer userids.
We have heard that Kafka can support 1000's of partitions in a single
cluster and I wanted to find out if it's reasonable to have that many
partitions?

Additionally, we'd like to have potentially 100,000's of consumers that are
consuming a somewhat low volume of log data from these partitions.  And I'd
also like to know if having that many consumers is reasonable with Kafka or
recommended.

The scenario would be something like we have 100,000 to 200,000 customers
where we'd like to have their data sharded by userid into a cluster of say
4000 partitions.  And then we'd like to have a consumer running for each
userid that is consuming the log data.

In this scenario we'd have (assuming 100,000 userids)

100,000/4000 = 25 consumers per partition where each consumer would be
reading each offset and ignoring whatever key is not related to the
assigned userid that it is consuming from.

My gut feeling with all of this tells me that this may not be a sound
solution because we'd need to have a ton of file descriptors open and there
could be a lot of overhead on Kafka managing this volume of consumers.

Any guidance is appreciated...mainly I'm just looking to see if this a
reasonable use of Kafka or if we need to go back to the drawing board.

I appreciate any help!

-Ralph

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message