spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <ste...@hortonworks.com>
Subject Re: Access several s3 buckets, with credentials containing "/"
Date Fri, 05 Jun 2015 10:55:20 GMT

> On 5 Jun 2015, at 08:03, Pierre B <pierre.borckmans@realimpactanalytics.com> wrote:
> 
> Hi list!
> 
> My problem is quite simple.
> I need to access several S3 buckets, using different credentials.:
> ```
> val c1 =
> sc.textFile("s3n://[ACCESS_KEY_ID1:SECRET_ACCESS_KEY1]@bucket1/file.csv").count
> val c2 =
> sc.textFile("s3n://[ACCESS_KEY_ID2:SECRET_ACCESS_KEY2]@bucket2/file.csv").count
> val c3 =
> sc.textFile("s3n://[ACCESS_KEY_ID3:SECRET_ACCESS_KEY3]@bucket3/file.csv").count
> ...
> ```
> 
> One/several of those AWS credentials might contain "/" in the private access
> key.
> This is a known problem and from my research, the only ways to deal with
> these "/" are:
> 1/ use environment variables to set the AWS credentials, then access the s3
> buckets without specifying the credentials
> 2/ set the hadoop configuration to contain the the credentials.
> 
> However, none of these solutions allow me to access different buckets, with
> different credentials.
> 
> Can anyone help me on this?
> 
> Thanks
> 
> Pierre

long known outstanding bug in Hadoop s3n, nobody has ever sat down to fix. One subtlety is
its really hard to test -as you need credentials with a / in. 

The general best practise is recreate your credentials

Now, if you can get the patch to work against hadoop trunk, I promise I will commit it
https://issues.apache.org/jira/browse/HADOOP-3733

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message