spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <ste...@hortonworks.com>
Subject Re: Access S3 buckets in multiple accounts
Date Wed, 28 Sep 2016 14:03:06 GMT

On 27 Sep 2016, at 15:53, Daniel Siegmann <dsiegmann@securityscorecard.io<mailto:dsiegmann@securityscorecard.io>>
wrote:

I am running Spark on Amazon EMR and writing data to an S3 bucket. However, the data is read
from an S3 bucket in a separate AWS account. Setting the fs.s3a.access.key and fs.s3a.secret.key
values is sufficient to get access to the other account (using the s3a protocol), however
I then won't have access to the S3 bucket in the EMR cluster's AWS account.

Is there any way for Spark to access S3 buckets in multiple accounts? If not, is there any
best practice for how to work around this?



There are 2 ways to do this without changing permissions

1. different implementations: use s3a for one, s3n for the other, give them the different
secrets

2. insecure: use the secrets in the URI. s3a://AWSID:escaped-secret@bucket/path
-leaks your secrets thoughout the logs, has problems with "/" in the password..if there is
one, you'll probably need to regenerate the password.

This is going to have to be fixed in the s3a implementation at some point, as it's not only
needed for cross user auth, once you switch to v4 AWS auth you need to specify the appropriate
s3 endpoint for your region; you can't just use s3 central, but need to choose s3 frankfurt,
s3 seoul, etc: so won't be able to work with data across regions.

Mime
View raw message