hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zheng Shao (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HADOOP-13975) Allow DistCp to use MultiThreadedMapper
Date Thu, 12 Jan 2017 00:27:16 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-13975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Zheng Shao updated HADOOP-13975:
    Attachment: HADOOP-distcp-multithreaded-mapper-trunk.3.patch

Fixed checkstyle issues.

> Allow DistCp to use MultiThreadedMapper
> ---------------------------------------
>                 Key: HADOOP-13975
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13975
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: tools/distcp
>    Affects Versions: 3.0.0-alpha1
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>            Priority: Minor
>         Attachments: HADOOP-distcp-multithreaded-mapper-branch26.1.patch, HADOOP-distcp-multithreaded-mapper-branch26.2.patch,
HADOOP-distcp-multithreaded-mapper-branch26.3.patch, HADOOP-distcp-multithreaded-mapper-trunk.1.patch,
HADOOP-distcp-multithreaded-mapper-trunk.2.patch, HADOOP-distcp-multithreaded-mapper-trunk.3.patch
> Although distcp allow users to control the parallelism via number of mappers, sometimes
it's desirable to run fewer mappers but more threads per mapper.  Since distcp is network
bound (either by throughput or more frequently by latency of creating connections, opening
files, reading/writing files, and closing files), this can make each mapper much more efficient.
> In that way, a lot of resources can be shared so we can save memory and connections to

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message