hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-15576) S3A Multipart Uploader to work with S3Guard and encryption
Date Wed, 25 Jul 2018 19:02:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-15576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16556138#comment-16556138

Steve Loughran commented on HADOOP-15576:

you're going to hate me for this, but here's my review, including going through all of the
base test class. I'll note that when that python spec of the operations surfaces, the pre/post
conditions of all the operations will show which tests are needed to try and break things,
and what exceptions are expected on failure. 

h3. {{S3AMultipartUploader}}

{{putPart}} to use {{writeOperations.newUploadPartRequest}}. (FWIW, just realised the partID
range check may be out of date w.r.t the latest AWS S3 limits. Feel free to remove that partID
< 10000 check), as the far end will reject anything out range, won't it?

h3. Dependencies on HDFS. 

I've just seen that DfsUtilclient and various other HDFS imports are needed for all this to
work. As it is separate from S3A FS, it's probably OK to do this *for now*, but really it
should be using stuff from hadoop-common fs, not hdfs-client. We should not require hdfs-client
to be on the CP for S3 IO to work. I don't see the POM explicitly pulling in HDFS except for
testing, so it must be coming in transiently from somewhere. Pause: looks like it came in
with the committer and its uprate of hadoop-mapreduce-client-core from test to provided. Filed
HADOOP-15631 and HDFS-13766 to cover this. I don't belive that HD/Insights includes the HDFS
JARs, so everything needed for multipart uploads will need to move to the hadoop-common lib.

*It's too late to address in 3.2, but it does need fixing, initially within HDFS*

h3. Tests

* {AbstractSystemMultipartUploaderTest}} should move to {{AbstractFSContractTestBase}}. Gives
you the timeouts, the thread naming, the logic for unique paths in forks for parallel test
runs, and a binding mechanism which is used for all the contract tests.
* all asserts need to have meaningful messages so that failures in jenkins can be tracked
* All uses of fs.open need to be closed in try/finally or try-with-resource. I don't see IOUtils.toByteArray
closing it. Better, use {{ContractTestUtils.readUTF8}} and have do all the work, then you
can just compare strings (which gives you better text on a mismatch)
* And before you open the file, use {{ContractTestUtils.assertPathExists}}. Why? Does a listing
of the parent dir and includes it in assertion on a failure.

{{testMultipartUpload}} and {{testMultipartUploadReverseOrder(}}. If you aren't using assertArrayEquals,
you'll need a message for the size mismatch, ideally including the difference. Best to convert
the array to a string {{org.apache.commons.collections.CollectionUtils}} or java8 streams
and include that in the message)

{{testMultipartUploadAbort}}: should use LambdaTestUtils.intercept, e.g.

intercept(FileNotFoundException.class, () -> fs.listStatus(file));

This will handle the rethrowing of any unexpected exception, and if l-expression returns a
value, will raise an assertion *including the string value of the response. This also means
you can remove the import of {{junit.framework.TestCase.assertTrue}}, which is good as the
assert is coming from the old Junit classes. 

{{testMultipartUploadAbort}} to try to {{complete()}} upload after the abort; expect a failure.
Then assert that the dest file doesn't exist with {{{{ContractTestUtils.assertPathDoesNotExist}}

Extra tests to add.

- Add a test to try and abort an unknown upload
- test to try to complete unknown upload
- try to complete an upload twice
- try to abort an upload twice. What is expected here?
- initiate two MPUs to the same dest. What happens? (interesting q; outcome may vary on FS)
- Initiate MPU to a directory; expect failure
- Ser/deser of an upload handle. Needed to detect accidental use of non-serializable stuff
within a single process.
- try to delete a path partway through an upload. Again, what happens?

> S3A  Multipart Uploader to work with S3Guard and encryption
> -----------------------------------------------------------
>                 Key: HADOOP-15576
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15576
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.2
>            Reporter: Steve Loughran
>            Assignee: Ewan Higgs
>            Priority: Blocker
>         Attachments: HADOOP-15576.001.patch
> The new Multipart Uploader API of HDFS-13186 needs to work with S3Guard, with the tests
to demonstrate this
> # move from low-level calls of S3A client to calls of WriteOperationHelper; adding any
new methods are needed there.
> # Tests. the tests of HDFS-13713. 
> # test execution, with -DS3Guard, -DAuth
> There isn't an S3A version of {{AbstractSystemMultipartUploaderTest}}, and even if there
was, it might not show that S3Guard was bypassed, because there's no checks that listFiles/listStatus
shows the newly committed files.
> Similarly, because MPU requests are initiated in S3AMultipartUploader, encryption settings
are't picked up. Files being uploaded this way *are not being encrypted*

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message