kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Friedman <...@flurry.com>
Subject Consumer throughput imbalance
Date Sat, 24 Aug 2013 16:57:30 GMT
Hey guys! We recently deployed our kafka data pipeline application over the weekend and it
is working out quite well once we ironed out all the issues. There is one behavior that we've
noticed that is mildly troubling, though not a deal breaker. We're using a single topic with
many partitions (1200 total) to load balance our 300 consumers, but what seems to happen is
that some partitions end up more backed up than others. This is probably due more to the specifics
of the application since some messages take much longer than others to process. 

I'm thinking that the random partitioning in the producer is unsuited to our specific needs.
One option I was considering was to write an alternate partitioner that looks at the consumer
offsets from zookeeper (as in the ConsumerOffsetChecker) and probabilistically weights the
partitions by their lag. Does this sound like a good idea to anyone else? Is there a better
or preferably already built solution? If anyone has any ideas or feedback I'd sincerely appreciate
it.

Thanks so much in advance.

P.S. thanks especially to everyone who's answered my dumb questions on this mailing list over
the past few months, we couldn't have done it without you! 

-- 
Ian Friedman


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message