hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-12689) S3 filesystem operations stopped working correctly
Date Thu, 07 Jan 2016 10:19:40 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-12689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15087169#comment-15087169
] 

Steve Loughran commented on HADOOP-12689:
-----------------------------------------

-1

Ravi. please, no: not without new tests to show this problem is fixed.  if the last fix was
a regression, then we need something more to show the regression has gone away. That doesn't
have to be a general purpose contract test, something in in hadoop-tools/hadoop-aws package.


look at the jenkins
{code}
The patch doesn't appear to include any new or modified tests. Please justify why no new tests
are needed for this patch. Also please list what manual steps were performed to verify this
patch.
{code}

S3n and siblings are burning sore in the Hadoop codebase: undermaintained, undertested and
incredibly brittle to change.

If HADOOP-10542 did break things —and I trust your claim there— then it slipped through
the current s3 test suite. We need another test to make sure this problem never comes back.
It doesn't have to be a full contract test, something in hadoop-aws will be enough. But saying
"we can add a test later" isn't the right tactic —we both know "later" means "never" in
this context. We also need all the existing s3 tests run to make sure this patch doesn't change
anything else, either.

Can we roll this back and do another iteration of the patch which does include a test? As
mandating the "patches include tests" policy is the only way we can keep test coverage up,
especially on something this brittle

sorry

> S3 filesystem operations stopped working correctly
> --------------------------------------------------
>
>                 Key: HADOOP-12689
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12689
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: tools
>    Affects Versions: 2.7.0
>            Reporter: Matthew Paduano
>            Assignee: Matthew Paduano
>              Labels: S3
>             Fix For: 2.8.0
>
>         Attachments: HADOOP-12689.01.patch
>
>
> HADOOP-10542 was resolved by replacing "return null;" with throwing  IOException.   This
causes several S3 filesystem operations to fail (possibly more code is expecting that null
return value; these are just the calls I noticed):
> S3FileSystem.getFileStatus() (which no longer raises FileNotFoundException but instead
IOException)
> FileSystem.exists() (which no longer returns false but instead raises IOException)
> S3FileSystem.create() (which no longer succeeds but instead raises IOException)
> Run command:
> hadoop distcp hdfs://localhost:9000/test s3://xxx:yyy@com.bar.foo/
> Resulting stack trace:
> 2015-12-11 10:04:34,030 FATAL [IPC Server handler 6 on 44861] org.apache.hadoop.mapred.TaskAttemptListenerImpl:
Task: attempt_1449826461866_0005_m_000006_0 - exited : java.io.IOException: /test doesn't
exist
> at org.apache.hadoop.fs.s3.Jets3tFileSystemStore.get(Jets3tFileSystemStore.java:170)
> at org.apache.hadoop.fs.s3.Jets3tFileSystemStore.retrieveINode(Jets3tFileSystemStore.java:221)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
> at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> at com.sun.proxy.$Proxy17.retrieveINode(Unknown Source)
> at org.apache.hadoop.fs.s3.S3FileSystem.getFileStatus(S3FileSystem.java:340)
> at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:230)
> at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:50)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> changing the "raise IOE..." to "return null" fixes all of the above code sites and allows
distcp to succeed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message