hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-12756) Incorporate Aliyun OSS file system implementation
Date Sat, 28 May 2016 11:11:13 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-12756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15305306#comment-15305306
] 

Steve Loughran commented on HADOOP-12756:
-----------------------------------------

I concur with [~aw]: if a jenkins server is running submitted code, then it is precisely one
patch submission away from having credentials leaked.

There's a set of problems that need to be addressed when working with object stores
# Development: does your own code work?
# Patch review: does a newly submitted patch work?
# Regression testing: does the branch/trunk work?

**Development** Development in a module for a specific infra: aws, openstack, azure must,
obviously, require the credentials to test there. More subtly, changes to the filesystem APIs
*and tests* need testing too. In HADOOP-13207, for example, I have to test all implementations
of an abstract contract test: local, rawlocal, HDFS, s3a, azure.

*Regression Testing* This is what makes reviewing object store patches hard. The reviewer
needs to have the credentials, first prescan the review to make sure it doesn't leak information
(that's both malicious attacks and simply over-zealous logging). Then they need to do a test
run, which is about 30-60 minutes —which is why it's pretty frustrating if the patch fails.
Hence the policy of: nobody will look at your patch until you declare which infra your tests
successfully completed against. It forces the developers to apply due diligence.

Maybe, just maybe, this could be partially automated. As an example in spark PRs, a set of
committers can add a comment, 'Jenkins, test this" and the UCB Jenkins engine will run a test.
If we could something like that, with a patch test only kicking off after human intervention,
we could improve patch review.

*Regression Testing*

This is an area where a private jenkins instance with the credentials can contribute: nightly
test runs of the object store module(s) —and a process for reacting to failures. We do this
internally a lot, where the escalation process is: someone gets to fix the failure. It's that
escalation process which needs to be set up —its not enough for a private Jenkins machine/VM
to send emails saying a test run failed, it needs having people on the developer lists who
care and can react. That means, you get to stay on the dev lists —welcome!

Note that in SPARK-7481 I'm adding end-to-end testing through spark; you can see it at work
by comparing an s3a test run with the hadoop 2.6 profile vs hadoop-2.7. the 2.6 one is clearly
broken —if we'd had those tests up earlier, that'd have been clear at the time. I'm designing
that module to be extensible, once it's in, adding dependencies and tests for a new FS should
be straightforward


> Incorporate Aliyun OSS file system implementation
> -------------------------------------------------
>
>                 Key: HADOOP-12756
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12756
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs
>    Affects Versions: 2.8.0
>            Reporter: shimingfei
>            Assignee: shimingfei
>         Attachments: HADOOP-12756-v02.patch, HCFS User manual.md, OSS integration.pdf,
OSS integration.pdf
>
>
> Aliyun OSS is widely used among China’s cloud users, but currently it is not easy to
access data laid on OSS storage from user’s Hadoop/Spark application, because of no original
support for OSS in Hadoop.
> This work aims to integrate Aliyun OSS with Hadoop. By simple configuration, Spark/Hadoop
applications can read/write data from OSS without any code change. Narrowing the gap between
user’s APP and data storage, like what have been done for S3 in Hadoop 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message