kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andras Beni <andrasb...@cloudera.com>
Subject Re: ConsumerOffsets uneven partition distribution
Date Tue, 20 Mar 2018 11:02:58 GMT
Hi Johnny,

As you already mentioned, it depends on the group.id which broker will be
the group leader.
You can change the group.id to modify which _consumer_offsets partition the
group will belong to, thus change which broker will manage a group. You can
check which partition a group.id is assigned using

Utils.toPositive(Utils.murmur2(groupIdAsByteArray)) % partitionCount

consumer group is a way to distribute work across equivalent consumers. I
would assume it is a good idea but it depends on your architecture and use
case.

Best regards,
Andras

On Sat, Mar 17, 2018 at 12:55 PM, Johnny Luo <johnnyl@campaignmonitor.com>
wrote:

> Hello,
>
> We are running a 16 nodes kafka cluster on AWS, each node is a m4.xLarge
> EC2 instance,  with 2TB EBS(ST1) disk.  Kafka version is 0.10.1.0, we have
> about 100 topics at the moment.  Some busy topics will have about 2 billion
> events every day, some low volume topics will only have thousands per day.
>
> Most of our topics use an UUID as the partition key when we produce the
> message, so the partitions are quite evenly distributed.
>
> We have quite a lot consumer consume from this cluster using consumer
> group. Each consumer has a unique group id. Some consumer group commit
> offsets every 500ms, some will commit offsets in sync as soon as it
> finishes processing a batch of messages.
>
> Recently we observed a behaviour that some of the brokers are far busier
> than the others.  With some digging, we find out, it is actually quite a
> lot traffic go to "__consumer_offsets", thus we created a tool to see the
> high watermark of each partitions in "__consumer_offsets", which reveal
> that the partitions are very uneven distributed.
>
> Based on this link "Consumer offset management in Kafka"
>
> It seems it is an intended behaviour, each consumer group only have one
> leader, thus committed offsets all need to go to this leader, and also only
> use “group.Id” to decide the partition.
>
> Given the fact that we have some consumers consume from those very busy
> topics, thus the commit offsets will cause a lot traffic to
> "__consumer_offsets" topic on the broker that handle the consumer group.
>
> My questions are :
> 1. Is there a way we can make sure that the consumer groups that consume
> from busy topics doesn't fall on to the same broker? Don’t' want to create
> a hotspot.
> 2. For consumers that consumer from busy topics (topics have billions
> messages per day), is it a good idea to use consumer group?
>
> Thanks in advance
>
> Johnny Luo
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message