mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sebastian Schelter <...@apache.org>
Subject Re: Combiner applied on multiple map task outputs (like in Mahout SVD)
Date Thu, 27 Sep 2012 09:18:23 GMT
Jake is absolutely right here, the combiner is also applied on the
reducers, I forgot to mention that.

The shuffle phase in Hadoop is basically a distributed merge-sort. When
the reducers start to merge the mapper outputs, they can also apply the
combiner. However this doesn't help with reducing network traffic.

The chapter 'Shuffle and Sort' in 'Hadoop: The definitive guide' has a
detailed chapter describing this process.

--sebastian


On 27.09.2012 11:11, Sigurd Spieckermann wrote:
> OK, I see. Makes sense. Thank you!
> 
> 2012/9/27 Sean Owen <srowen@gmail.com>
> 
>> I think he means that it is not only applied to the output of the
>> mapper, but to output of the combiners many times as well. It is not
>> used at the reducer.
>>
>> On Thu, Sep 27, 2012 at 9:56 AM, Sigurd Spieckermann
>> <sigurd.spieckermann@gmail.com> wrote:
>>> @Jake: Could you please elaborate on how exactly the combiner can be
>> called
>>> before the reducer gets the data? Do you mean the combiner is called at
>> the
>>> datanode that instantiates reducer tasks? I thought the combiner is just
>>> called after the map task has finished and still on that datanode.
>>>
>>> 2012/9/26 Jake Mannix <jake.mannix@gmail.com>
>>>
>>>> It should also be noted that the Combiner does not only run for the
>> mappers
>>>> -
>>>> they can be used one (or more) times after mapping, and then one or more
>>>> times before the reducer gets the results.  It's not quite so simple as
>> to
>>>> say that
>>>> you get combiners used only (and always) on the outputs of each map
>> task.
>>
> 


Mime
View raw message