hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Moist (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-15006) Encrypt S3A data client-side with Hadoop libraries & Hadoop KMS
Date Mon, 08 Jan 2018 23:35:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-15006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16317345#comment-16317345

Steve Moist commented on HADOOP-15006:

Do have any good links for further reading on the crypto algorithms, particularly the NoPadding
variant you mention? (How do lengths and byte offsets map from the user data to the encrypted
I've got a few links about general block ciphers and padding.  I'll post more as I find them
* http://web.cs.ucdavis.edu/~rogaway/papers/modes.pdf is a good(and lengthy) doc on encryption,
look at page 5 for a summary and then page 45 for more on CTR.
* https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation#Padding Obligatory wikipedia
* https://www.cryptosys.net/pki/manpki/pki_paddingschemes.html

 (How do lengths and byte offsets map from the user data to the encrypted stream?)
They should map 1-1.  AES works on a fixed block size of 16 bytes.  So you would read from
0->15 bytes, 16->31, etc.  It means you can't read 3->18 directly, you'd have to
read 0->31 to read 3->18.  Not sure exactly but I would imagine HDFS transparent encryption
also has the same issue that has already been solved.  It would just mean we would have to
get the before block as well to properly decrypt.  CTR allows for random encryption/decryption
so I don't expect this to be a problem performing encryption/decryption .  It's just a minor
technical point.  So far in my testing I haven't hit it, but I also haven't been directly
invoking the MultiPartUpload.  This is the only issue I see when randomly reading/writing
blocks and it's easily solvable.

What are the actual atomicity requirements?
This is a good question.  The main atomicity requirements that I have is that once the S3a
stream is closed and the object committed that the OEMI is also committed.  I haven't fully
worked around that from a specific code perspective yet.  
Specifically, how do we handle multiple clients racing to create the same path?
Using OEMI Storage Option #5: Suppose userA uploads objectA, with OEMIA to key1 and userB
uploads objectB with OEMIB to key1.  S3 is doesn't guarantee which one is the winner, so it
is possible that objectA is stored with OEMIB.  This shouldn't happen if OEMI is stored as
object metadata.  It could be done such that we create a "lock" on the DynamoDB row that userA
owns that location preventing the upload of objectB.  In HDFS, once the INode is created,
it should prevent userB from creating that file, perhaps we should do the same for S3?

Also since the scope of the encryption zone is the bucket, we could get by with a very low
provisioned I/O budget on the Dynamo table and save money, no?
Yea we should be able to, I believe the only requirements is that the table can have things
inserted and read.  IIRC, each bucket gets their own S3a jvm (or somethign to that effect)
so at least at startup we can cache its EZ information.

> Encrypt S3A data client-side with Hadoop libraries & Hadoop KMS
> ---------------------------------------------------------------
>                 Key: HADOOP-15006
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15006
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs/s3, kms
>            Reporter: Steve Moist
>            Priority: Minor
>         Attachments: S3-CSE Proposal.pdf
> This is for the proposal to introduce Client Side Encryption to S3 in such a way that
it can leverage HDFS transparent encryption, use the Hadoop KMS to manage keys, use the `hdfs
crypto` command line tools to manage encryption zones in the cloud, and enable distcp to copy
from HDFS to S3 (and vice-versa) with data still encrypted.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message