spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sujit Pal <sujitatgt...@gmail.com>
Subject Re: Access several s3 buckets, with credentials containing "/"
Date Sat, 06 Jun 2015 17:34:45 GMT
Hi Pierre,

One way is to recreate your credentials until AWS generates one without a
slash character in it. Another way I've been using is to pass these
credentials outside the S3 file path by setting the following (where sc is
the SparkContext).

    sc._jsc.hadoopConfiguration().set("fs.s3n.awsAccessKeyId", ACCESS_KEY)

    sc._jsc.hadoopConfiguration().set("fs.s3n.awsSecretAccessKey",
SECRET_KEY)

After that you can define the RDDs more simply:

val c1 = sc.textFile("s3n://bucket1/file.csv")

-sujit



On Fri, Jun 5, 2015 at 3:55 AM, Steve Loughran <stevel@hortonworks.com>
wrote:

>
> > On 5 Jun 2015, at 08:03, Pierre B <
> pierre.borckmans@realimpactanalytics.com> wrote:
> >
> > Hi list!
> >
> > My problem is quite simple.
> > I need to access several S3 buckets, using different credentials.:
> > ```
> > val c1 =
> >
> sc.textFile("s3n://[ACCESS_KEY_ID1:SECRET_ACCESS_KEY1]@bucket1/file.csv").count
> > val c2 =
> >
> sc.textFile("s3n://[ACCESS_KEY_ID2:SECRET_ACCESS_KEY2]@bucket2/file.csv").count
> > val c3 =
> >
> sc.textFile("s3n://[ACCESS_KEY_ID3:SECRET_ACCESS_KEY3]@bucket3/file.csv").count
> > ...
> > ```
> >
> > One/several of those AWS credentials might contain "/" in the private
> access
> > key.
> > This is a known problem and from my research, the only ways to deal with
> > these "/" are:
> > 1/ use environment variables to set the AWS credentials, then access the
> s3
> > buckets without specifying the credentials
> > 2/ set the hadoop configuration to contain the the credentials.
> >
> > However, none of these solutions allow me to access different buckets,
> with
> > different credentials.
> >
> > Can anyone help me on this?
> >
> > Thanks
> >
> > Pierre
>
> long known outstanding bug in Hadoop s3n, nobody has ever sat down to fix.
> One subtlety is its really hard to test -as you need credentials with a /
> in.
>
> The general best practise is recreate your credentials
>
> Now, if you can get the patch to work against hadoop trunk, I promise I
> will commit it
> https://issues.apache.org/jira/browse/HADOOP-3733
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

Mime
View raw message