spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <ste...@hortonworks.com>
Subject Re: Accessing s3a files from Spark
Date Tue, 31 May 2016 09:29:41 GMT
which s3 endpoint?



On 29 May 2016, at 22:55, Mayuresh Kunjir <mayuresh@cs.duke.edu<mailto:mayuresh@cs.duke.edu>>
wrote:

I'm running into permission issues while accessing data in S3 bucket stored using s3a file
system from a local Spark cluster. Has anyone found success with this?

My setup is:
- Spark 1.6.1 compiled against Hadoop 2.7.2
- aws-java-sdk-1.7.4.jar and hadoop-aws-2.7.2.jar in the classpath
- Spark's Hadoop configuration is as follows:
sc.hadoopConfiguration.set("fs.s3a.impl","org.apache.hadoop.fs.s3a.S3AFileSystem")
sc.hadoopConfiguration.set("fs.s3a.access.key", <access>)
sc.hadoopConfiguration.set("fs.s3a.secret.key", <secret>)
(The secret key does not have any '/' characters which is reported to cause some issue by
others)

I have configured my S3 bucket to grant the necessary permissions. (https://sparkour.urizone.net/recipes/configuring-s3/)

What works: Listing, reading from, and writing to s3a using hadoop command. e.g. hadoop dfs
-ls s3a://<bucket name>/<file path>

What doesn't work: Reading from s3a using Spark's textFile API. Each task throws an exception
which says *Forbidden Access(403)*.

Some online documents suggest to use IAM roles to grant permissions for an AWS cluster. But
I would like a solution for my local standalone cluster.

Any help would be appreciated.

Regards,
~Mayuresh


Mime
View raw message