hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ling Kun <lkun.e...@gmail.com>
Subject Re: Why In-memory Mapoutput is necessary in ReduceCopier
Date Tue, 12 Mar 2013 01:42:58 GMT
Dear Ravi and all,

   Thanks very much for your kindly reply.
   I am currently concern about whether it is possible to eliminate the
HTTP GET method using some other ways. And currently have not got a better

   Thanks agin.

Ling Kun

On Tue, Mar 12, 2013 at 12:58 AM, Ravi Prakash <ravihoo@ymail.com> wrote:

> Hi Ling,
> Yes! It is because of performance concerns. We want to keep and merge map
> outputs in memory as much as we can. The amount of memory reserved for this
> purpose is configurable. Obviously storing fetched map outputs on disk,
> then reading them back from disk to merge them and then write out back to
> disk, is a lot more expensive than if it were done in memory.
> Please let us know if you find there was an opportunity to keep the map
> output in memory but we did not, and instead shuffled to disk.
> Thanks
> Ravi
> ________________________________
>  From: Ling Kun <lkun.erlv@gmail.com>
> To: mapreduce-dev@hadoop.apache.org
> Sent: Monday, March 11, 2013 5:27 AM
> Subject: Why In-memory Mapoutput is necessary in ReduceCopier
> Dear all,
>      I am focusing on the Mapoutput copier implementation. This part of
> code will try to get mapoutputs, and merge them into a file that can feed
> to reduce functions. I have the following questions.
> 1. All the local file mapoutput data will be merged together by the
> LocalFSMerge, and the in-memory mapout will be merged by
> InMemFSMergeThread. For the InMemFSMergeThread, there is also a writer
> object   which write the result to outputPath ( ReduceTask.java Line 2843).
> It seems after merging, in-memory mapoutput and local file mapoutput data
> will all be stored in local file system. Why not just using the local file
> for all mapoutput data.
> 2. After using http to get  some fragment of a map output file, some of the
> mapoutput data will be selected and keep in memory, while others are
> directly write to local disk of reducers. Which mapoutput wil be kept in
> memory is determined in MapOutputCopier.getMapOutput(), this method will
> call ramManager.canFitInMemory().  why not store all the data to disk?
> 3. According to the comment, Hadoop will put a file in memory if it meets:
> a, the size of the (decompressed) file should be less than 25% of the total
> inmem fs; b, there is space available in the inmem fs. Why ? Is it because
> of the performance?
> Thanks
> yours,
> Ling Kun
> --
> http://www.lingcc.com


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message