hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Nauroth (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HADOOP-13145) In DistCp, prevent unnecessary getFileStatus call when not preserving metadata.
Date Fri, 13 May 2016 22:57:13 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-13145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Chris Nauroth updated HADOOP-13145:
    Attachment: HADOOP-13145.001.patch

The attached v001 patch avoids the unnecessary {{getFileStatus}} call.

The effect is particularly pronounced when running DistCp with a destination on S3A, where
eventual consistency on S3 can cause the {{getFileStatus}} call to fail with {{FileNotFoundException}}.
 Then, the whole MapReduce task fails, retries, and repeats copying all the data.  [~rajesh.balamohan],
I know you saw this with some recent large copies to S3A.  Would you be interested in trying
a test with this patch?  So far, I don't have my own repro.  Note that this patch is only
helpful as long as the DistCp command is not preserving metadata attributes, so don't use
the {{-p}} option.

Cc [~stevel@apache.org].

> In DistCp, prevent unnecessary getFileStatus call when not preserving metadata.
> -------------------------------------------------------------------------------
>                 Key: HADOOP-13145
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13145
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: tools/distcp
>            Reporter: Chris Nauroth
>            Assignee: Chris Nauroth
>         Attachments: HADOOP-13145.001.patch
> After DistCp copies a file, it calls {{getFileStatus}} to get the {{FileStatus}} from
the destination so that it can compare to the source and update metadata if necessary.  If
the DistCp command was run without the option to preserve metadata attributes, then this additional
{{getFileStatus}} call is wasteful.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message