hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-13887) Support for client-side encryption in S3A file system
Date Wed, 01 Nov 2017 14:42:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16234156#comment-16234156

Steve Loughran commented on HADOOP-13887:

This initial patch was just about turning client-side encryption on. Doing that makes for
data whose EOF may be slightly less than len(block) which will break all client code which
navigates off EOF, assumes the length of the data is the amount it can copy, etc. etc. And
if you lose the key, you are on your own.

At the same time, I can see the appeal of some form of support for this purely for some backup/restore
process, e.g. for encrypting data before -> glacier, decrypting it as part of a copy. I
think that can/should be done outside the s3a lib you can never reliably use client-side encrypted
S3 data as a source in any MR, Hive, Tez, Spark &c operation. People will end up encrypting
their data, then be filing bugs/support calls trying to understand why their queries are all

*Proposed*: change title of JIRA to "Encrypt S3A data client-side with AWS SDK", to make clear
goal, then close as a wontfix with a clear explanation. It's not that we can't take on code
that Igor has done, it's that the assumption that EOF=Len(file) is so fundamental, we can't
give it to downstream code and expect them to handle it.

The other grand proposal is, well, big. And as it goes near KMS & encryption, beyond my
scope. It also isn't going to interact with any other S3 client, which is a significant limitation.
I'm certainly not going to go near it, and I wouldn't be in a place to review any but the
"how does this glue to the input stream" issue. And even there fear would generally keep me
away from it.

*Proposed*: create a new JIRA., "Encrypt S3A data client-side with Hadoop libraries &
Hadoop KMS", put that proposal, and for now, let people comment on the proposal & see
where it goes. 

> Support for client-side encryption in S3A file system
> -----------------------------------------------------
>                 Key: HADOOP-13887
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13887
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 2.8.0
>            Reporter: Jeeyoung Kim
>            Assignee: Igor Mazur
>            Priority: Minor
>         Attachments: HADOOP-13887-002.patch, HADOOP-13887-007.patch, HADOOP-13887-branch-2-003.patch,
HADOOP-13897-branch-2-004.patch, HADOOP-13897-branch-2-005.patch, HADOOP-13897-branch-2-006.patch,
HADOOP-13897-branch-2-008.patch, HADOOP-13897-branch-2-009.patch, HADOOP-13897-branch-2-010.patch,
HADOOP-13897-branch-2-012.patch, HADOOP-13897-branch-2-014.patch, HADOOP-13897-trunk-011.patch,
HADOOP-13897-trunk-013.patch, HADOOP-14171-001.patch, S3-CSE Proposal.pdf
> Expose the client-side encryption option documented in Amazon S3 documentation  - http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingClientSideEncryption.html
> Currently this is not exposed in Hadoop but it is exposed as an option in AWS Java SDK,
which Hadoop currently includes. It should be trivial to propagate this as a parameter passed
to the S3client used in S3AFileSystem.java

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message