kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Min Yu <mini...@gmail.com>
Subject Re: multiple Hadoop consumer tasks per partition
Date Mon, 17 Sep 2012 23:54:41 GMT
If you want run each Mapper job per partition,

https://github.com/miniway/kafka-hadoop-consumer

might help.

Thanks
Min

2012. 9. 18. 오전 6:51 Matthew Rathbone <matthew@foursquare.com> 작성:

> Hey guys,
>
> I've been using the hadoop consumer a whole lot this week, but I'm seeing
> pretty poor throughput with one task per partition. I figured a good
> solution would be to have multiple tasks per partition, so I wanted to run
> my assumptions by you all first:
>
> This should enable the broker to round robin events between tasks right?
>
> When I record the high-watermark at the end of the mapreduce job there will
> be N entries for each partition (one per task), so is it correct to just
> take max(watermarks)?
> -- my assumption is that as they're getting events round-robin, everything
> should have been consumed up to the highest watermark found. Does this hold
> true?
>
> Is anyone else using the consumer like this?
>
>
>
> --
> Matthew Rathbone
> Foursquare | Software Engineer | Server Engineering Team
> matthew@foursquare.com | @rathboma <http://twitter.com/rathboma> |
> 4sq<http://foursquare.com/rathboma>

Mime
View raw message