hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-13145) In DistCp, prevent unnecessary getFileStatus call when not preserving metadata.
Date Thu, 19 May 2016 09:39:12 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15290800#comment-15290800

Steve Loughran commented on HADOOP-13145:

There's no S3 service in my country, I need to test against a datacentre in a country with
a lower tax regime yet still under EU data protection legislation coverage. Ireland; I could
benchmark Frankfurt.

If you think the large files repeat the same coverage as the smaller ones, yes, please unify.
Even so, I'd like it to be configurable so that I could set up test runs with smaller datasets
—and we have the option of test runs with larger files.

For those test, it'd be nice if the S3A setup explicitly turned the multipart threshold down
(8MB?) and the same for partition sizes, so that it'd test the multipart code path and distcp

> In DistCp, prevent unnecessary getFileStatus call when not preserving metadata.
> -------------------------------------------------------------------------------
>                 Key: HADOOP-13145
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13145
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: tools/distcp
>            Reporter: Chris Nauroth
>            Assignee: Chris Nauroth
>         Attachments: HADOOP-13145.001.patch, HADOOP-13145.003.patch
> After DistCp copies a file, it calls {{getFileStatus}} to get the {{FileStatus}} from
the destination so that it can compare to the source and update metadata if necessary.  If
the DistCp command was run without the option to preserve metadata attributes, then this additional
{{getFileStatus}} call is wasteful.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message