hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: Shuffle phase: fine-grained control of data flow
Date Wed, 07 Nov 2012 14:05:55 GMT
Hi Jiwei,

In trunk (i.e. MR2), the completion events selection + scheduling
logic lies under class EventFetcher's getMapCompletionEvents() method,
as viewable at http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/EventFetcher.java?view=markup

This EventFetcher thread is used by the Shuffle (reduce package)
class, to continually do the shuffling. The Shuffle class is then
itself used by the ReduceTask class (look in mapred package of same
maven module).

I guess you can start there, to see if a better selection+scheduling
logic would yield better results.

On Wed, Nov 7, 2012 at 12:26 PM, Jiwei Li <cxm170@gmail.com> wrote:
> Dear all,
> For jobs like Sort, massive amounts of network traffic happen during
> shuffle phase. The simple mechanism in Hadoop 1.0.4 to choose reduce nodes
> does not help reduce network traffic. If JobTracker is fully aware of
> locations of every map output, why not take advantage of this topology
> knowledge?
> So, is there anyone who knows where to develop such codes upon? Many thanks.
> Regards.
> --
> Jiwei

Harsh J

View raw message