hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HADOOP-15297) Make s3a etag -> checksum publishing option
Date Wed, 07 Mar 2018 20:49:01 GMT
Steve Loughran created HADOOP-15297:

             Summary: Make s3a etag -> checksum publishing option
                 Key: HADOOP-15297
                 URL: https://issues.apache.org/jira/browse/HADOOP-15297
             Project: Hadoop Common
          Issue Type: Sub-task
          Components: fs/s3
    Affects Versions: 3.1.0
            Reporter: Steve Loughran
            Assignee: Steve Loughran

HADOOP-15273 shows how distcp doesn't handle non-HDFS filesystems with checksums.

Exposing Etags as checksums, HADOOP-13282, breaks workflows which back up to s3a.

Rather than revert  I want to make it an option, off by default. Once we are happy with distcp
in future, we can turn it on.

Why an option? Because it lines up for a successor to distcp which saves src and dest checksums
to a file and can then verify whether or not files have really changed. Currently distcp relies
on dest checksum algorithm being the same as the src for incremental updates, but if either
of the stores don't serve checksums, silently downgrades to not checking. 

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message