hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zheng Shao (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HADOOP-13975) Allow DistCp to use MultiThreadedMapper
Date Wed, 11 Jan 2017 20:55:16 GMT
Zheng Shao created HADOOP-13975:

             Summary: Allow DistCp to use MultiThreadedMapper
                 Key: HADOOP-13975
                 URL: https://issues.apache.org/jira/browse/HADOOP-13975
             Project: Hadoop Common
          Issue Type: New Feature
          Components: tools/distcp
    Affects Versions: 3.0.0-alpha1
            Reporter: Zheng Shao
            Assignee: Zheng Shao
            Priority: Minor

Although distcp allow users to control the parallelism via number of mappers, sometimes it's
desirable to run fewer mappers but more threads per mapper.  Since distcp is network bound
(either by throughput or more frequently by latency of creating connections, opening files,
reading/writing files, and closing files), this can make each mapper much more efficient.

In that way, a lot of resources can be shared so we can save memory and connections to NameNode.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message