hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Knowledge gatherer <knowledge.gatherer....@gmail.com>
Subject Re: Sorting in Mapper to Reducer
Date Mon, 26 May 2014 12:04:06 GMT
Thanks a lot. It was really helpful.


On Sat, May 24, 2014 at 8:30 PM, Pedro Dusso <pmdusso@gmail.com> wrote:

> I believe some good web resources are:
>
>    - http://www.slideshare.net/cloudera/mr-perf
>    -
>
> http://gbif.blogspot.de/2011/01/setting-up-hadoop-cluster-part-1-manual.html(look
> at "The Map Side" section
>    - This chapter from the T. White's Hadoop book:
>
> https://www.inkling.com/read/hadoop-definitive-guide-tom-white-3rd/chapter-6/shuffle-and-sort
>    - Explanation abou the Map Task:
>    http://codrspace.com/b441berith/hadoop-maptask-inside/
>
>
> Basically, the keys emitted from the map function are accumulated in a
> in-memory buffer (MapOutputBuffer class). When the buffer gets full, the
> keys are sorted first by partition and, within the partitions, by key and
> then write in a temporary file called spill. The in-memory sorting
> algorithm used is quicksort. When the map task has finished processing its
> input split, possibly there will be many spills, which must be merged into
> one single file in order to be available for the reduce tasks.
>
> Best,
>
> Dusso
>
>
> 2014-05-24 16:10 GMT+02:00 Knowledge gatherer <
> knowledge.gatherer.007@gmail.com>:
>
> > Hi,
> >
> >   I want to know how the sort happens in ascending order, whenever the
> keys
> > from mappers are emitted to reducer.
> >
> > What is the algorithm being used ?
> >
> > Any links or guidelines will be of real help.
> >
> > Thanks in Advance.
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message