hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mahesh Balija <balijamahesh....@gmail.com>
Subject Re: running Combiner for all the task on the node.
Date Wed, 02 Jan 2013 12:50:52 GMT
Continued,

Also one more shuffle and sort phase should occur so that you can
merge/combine them properly.
So you should decide whether additional shuffle and sort phase will be
overhead in contrast with combine per node.

Best,
Mahesh Balija,
Calsoft Labs.

On Wed, Jan 2, 2013 at 6:14 PM, Mahesh Balija <balijamahesh.mca@gmail.com>wrote:

> Hi Suresh,
>
>                The combiner function will aggregate the data from a single
> map instance. But NOT for all the maps running in a given node.
>                AFAIK As the maps will be running in the individual child
> JVMs, still the intermediate data need to be serialized (moved) so that
> your combiner can aggregate the data at Node level.
>
> Best,
> Mahesh Balija,
> Calsoft Labs.
>
>
>
> On Wed, Jan 2, 2013 at 5:53 PM, Suresh S <sureshhot@gmail.com> wrote:
>
>> Hello,
>>
>>       I think, running combiner function at node level (to combine all the
>> map task output of the node) may reduce the intermediate data movement.
>>
>>      I don't know this technique is already available or not. Is it worth
>> for working in this direction?
>> Any suggestions? Thanks in advance.
>> *Regards*
>> *S.Suresh,*
>> *Research Scholar,*
>> *Department of Computer Applications,*
>> *National Institute of Technology,*
>> *Tiruchirappalli - 620015.*
>> *+91-9941506562*
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message