kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Johnny Lou <john...@campaignmonitor.com>
Subject Re: ConsumerOffsets uneven partition distribution
Date Wed, 21 Mar 2018 00:28:56 GMT
Hi Andras, 

   Thanks for that information, handcraft group.id to make sure it spread across broker is
a way, I will give that a go. 

  I understand the benefit of consumer group, my concern at the moment is the potential to
create a hot spot on one or the broker...

Thanks, 

Johnny Luo

On 20/3/18, 10:03 pm, "Andras Beni" <andrasbeni@cloudera.com> wrote:

    Hi Johnny,
    
    As you already mentioned, it depends on the group.id which broker will be
    the group leader.
    You can change the group.id to modify which _consumer_offsets partition the
    group will belong to, thus change which broker will manage a group. You can
    check which partition a group.id is assigned using
    
    Utils.toPositive(Utils.murmur2(groupIdAsByteArray)) % partitionCount
    
    consumer group is a way to distribute work across equivalent consumers. I
    would assume it is a good idea but it depends on your architecture and use
    case.
    
    Best regards,
    Andras
    
    On Sat, Mar 17, 2018 at 12:55 PM, Johnny Luo <johnnyl@campaignmonitor.com>
    wrote:
    
    > Hello,
    >
    > We are running a 16 nodes kafka cluster on AWS, each node is a m4.xLarge
    > EC2 instance,  with 2TB EBS(ST1) disk.  Kafka version is 0.10.1.0, we have
    > about 100 topics at the moment.  Some busy topics will have about 2 billion
    > events every day, some low volume topics will only have thousands per day.
    >
    > Most of our topics use an UUID as the partition key when we produce the
    > message, so the partitions are quite evenly distributed.
    >
    > We have quite a lot consumer consume from this cluster using consumer
    > group. Each consumer has a unique group id. Some consumer group commit
    > offsets every 500ms, some will commit offsets in sync as soon as it
    > finishes processing a batch of messages.
    >
    > Recently we observed a behaviour that some of the brokers are far busier
    > than the others.  With some digging, we find out, it is actually quite a
    > lot traffic go to "__consumer_offsets", thus we created a tool to see the
    > high watermark of each partitions in "__consumer_offsets", which reveal
    > that the partitions are very uneven distributed.
    >
    > Based on this link "Consumer offset management in Kafka"
    >
    > It seems it is an intended behaviour, each consumer group only have one
    > leader, thus committed offsets all need to go to this leader, and also only
    > use “group.Id” to decide the partition.
    >
    > Given the fact that we have some consumers consume from those very busy
    > topics, thus the commit offsets will cause a lot traffic to
    > "__consumer_offsets" topic on the broker that handle the consumer group.
    >
    > My questions are :
    > 1. Is there a way we can make sure that the consumer groups that consume
    > from busy topics doesn't fall on to the same broker? Don’t' want to create
    > a hotspot.
    > 2. For consumers that consumer from busy topics (topics have billions
    > messages per day), is it a good idea to use consumer group?
    >
    > Thanks in advance
    >
    > Johnny Luo
    >
    



Mime
View raw message