hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ravi Prakash (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-13091) DistCp masks potential CRC check failures
Date Wed, 04 May 2016 21:38:13 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15271478#comment-15271478

Ravi Prakash commented on HADOOP-13091:

# To be more specific, could you please rename {{RetriableFileCopyCommand.ignoreFailures}}
? In CopyMapper, {{ignoreFailures}} makes sense. 
# I don't think we need the extensive changes to {{DistCpUtils.checksumsAreEqual}} anyway.
We should just rethrow the IOException and handle it properly in either {{doCopy}} or {{compareCheckSums}}
# *IF* you were to keep the changes in {{DistCpUtils.checksumsAreEqual}}, could you please
also fix the javadoc? e.g. this will no longer be true:
{code}   * If checksums's can't be retrieved, it doesn't fail the test
   * Only time the comparison would fail is when checksums are
   * available and they don't match
# {{throw new IOException(msg);}} We are throwing away whatever information {{e}} may contain.
I don't think that's a good idea.
# {code}+    if (!ignoreFailures) {
+      if (sourceChecksum == null) {
+        LOG.warn("Unable to retrieve checksum for " + source);
+        return false;
+      }
+      if (targetChecksum == null) {
+        LOG.warn("Unable to retrieve checksum for " + target);
+        return false;
+      }
+    }
{code} can be simplified to {code}
if (!ignoreFailures) {
      if (sourceChecksum == null || targetChecksum == null) {
        LOG.warn("Unable to retrieve checksum for " + (sourceChecksum == null ? source : target)
        return false;

This would also be a backward incompatible change (because we are changing the behavior of
existing options), so we should label the JIRA as such. I propose we still target version

> DistCp masks potential CRC check failures
> -----------------------------------------
>                 Key: HADOOP-13091
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13091
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 2.7.1
>            Reporter: Elliot West
>            Assignee: Lin Yiqun
>         Attachments: HDFS-10338.001.patch, HDFS-10338.002.patch
> There appear to be edge cases whereby CRC checks may be circumvented when requests for
checksums from the source or target file system fail. In this event CRCs could differ between
the source and target and yet the DistCp copy would succeed, even when the 'skip CRC check'
option is not being used.
> The code in question is contained in the method [{{org.apache.hadoop.tools.util.DistCpUtils#checksumsAreEqual(...)}}|https://github.com/apache/hadoop/blob/release-2.7.1/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/DistCpUtils.java#L457]
> Specifically this code block suggests that if there is a failure when trying to read
the source or target checksum then the method will return {{true}} (i.e.  the checksums are
equal), implying that the check succeeded. In actual fact we just failed to obtain the checksum
and could not perform the check.
> {code}
> try {
>   sourceChecksum = sourceChecksum != null ? sourceChecksum : 
>     sourceFS.getFileChecksum(source);
>   targetChecksum = targetFS.getFileChecksum(target);
> } catch (IOException e) {
>   LOG.error("Unable to retrieve checksum for " + source + " or "
>     + target, e);
> }
> return (sourceChecksum == null || targetChecksum == null ||
>   sourceChecksum.equals(targetChecksum));
> {code}
> I believe that at the very least the caught {{IOException}} should be re-thrown. If this
is not deemed desirable then I believe an option ({{--strictCrc}}?) should be added to enforce
a strict check where we require that both the source and target CRCs are retrieved, are not
null, and are then compared for equality. If for any reason either of the CRCs retrievals
fail then an exception is thrown.
> Clearly some {{FileSystems}} do not support CRCs and invocations to {{FileSystem.getFileChecksum(...)}}
return {{null}} in these instances. I would suggest that these should fail a strict CRC check
to prevent users developing a false sense of security in their copy pipeline.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message