kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Friedman <...@flurry.com>
Subject Re: Consumer throughput imbalance
Date Mon, 26 Aug 2013 18:33:59 GMT
On Sunday, August 25, 2013 at 3:11 PM, Jay Kreps wrote:
> I'm still a little confused by your description of the problem. It might be
> easier to understand if you listed out the exact things you have measured,
> what you saw, and what you expected to see.

The problem is that some consumers are slower than others, due to a lot of factors such as
resource contention on the box itself, on our HBase cluster, and the actual processing it's
doing itself. We are sending very small messages that are actually HDFS paths, which then
get opened on the consumers and read. Each of these files takes between 1-15 minutes to process,
and sometimes can take up to 30 minutes when the load on our hbase cluster is very high from
certain MR jobs. We were hoping to get some experience with Kafka and flush out any issues
with our use of the project before implementing a solution that actually queued all the data
in those HDFS files to Kafka itself, and this seemed like a good intermediate step. 

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message