hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Zhang <zjf...@gmail.com>
Subject Re: The idea to enhance MapReduce to resolve the skew problem
Date Thu, 04 Feb 2010 09:55:54 GMT

Do you mean do resplitting and recombining in each mapper task ? I am sure
what the purpose, as my understanding, the Partitioner determine which
reducer the output of mapper task go. So I don't think you method can solve
the skew problem.

2010/2/4 易剑 <myhadoop@gmail.com>

> Currently, only map tasks are balanced, and reduce tasks possible are skew,
> the timeslice is also different, which lead the scheduler is not smart. I
> have an idea to improve it.
> We can break the output of map to N*M splits, N is the number of nodes, and
> M >=1,and regroup to new splits bycombining the smaller splits and
> resplitting the bigger splits, until the size of every splits is balanced
> with the specified value.
> There are three cases:
> 1. Too many values for a key
> 2. Too many keys hash to a partition
> 3. Every partition is balanced in the size
> If too many values for a key, adding a new MapReduce procedure is
> necessary.
> If too many keys hash to a partition, resplitting is necessary.
> If every splitting is balanced, we can consider a task (map or reduce) to a
> scheduler timeslice, the scheduler will be smart like OS's scheduler.

Best Regards

Jeff Zhang

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message