kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Srikanth <srikanth...@gmail.com>
Subject Hash partition of key with skew
Date Wed, 27 Apr 2016 19:05:33 GMT
Hello,

Is there a recommendation for handling producer side partitioning based on
a key with skew?
We want to partition on something like clientId. Problem is, this key has
an uniform distribution.
Its equally likely to see a key with 3k occurrence/day vs 100k/day vs
65million/day.
Cardinality of key is around 1500 and there are approx 1 billion records
per day.
Partitioning by hashcode(key)%numOfPartition will create a few "hot
partitions" and cause a few brokers(and consumer threads) to be overloaded.
May be these partitions with heavy load are evenly distributed among
brokers, may be they are not.

I read KIP-22
<https://cwiki.apache.org/confluence/display/KAFKA/KIP-22+-+Expose+a+Partitioner+interface+in+the+new+producer>
that
explains how one could write a custom partitioner.
I'd like to know how it was used to solve such data skew.
We can compute some statistics on key distribution offline and use it in
the partitioner.
Is that a good idea? Or is it way too much logic for a partitioner?
Anything else to consider?
Any thoughts or reference will be helpful.

Thanks,
Srikanth

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message