spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gourav Sengupta <gourav.sengu...@gmail.com>
Subject Re: Accessing s3a files from Spark
Date Wed, 01 Jun 2016 07:13:09 GMT
Hi,

I am sorry, I do read this https://wiki.apache.org/hadoop/AmazonS3 which
mentions about s3:// being deprecated. From what I read using s3a is the
preferred way to go.

Ofcourse, I have been using it for writing data from SPARK but not for
reading yet. Let me try that and come back.

Regards,
Gourav Sengupta

On Tue, May 31, 2016 at 12:22 PM, Mayuresh Kunjir <mayuresh@cs.duke.edu>
wrote:

> How do I use it? I'm accessing s3a from Spark's textFile API.
>
> On Tue, May 31, 2016 at 7:16 AM, Deepak Sharma <deepakmca05@gmail.com>
> wrote:
>
>> Hi Mayuresh
>> Instead of s3a , have you tried the https:// uri for the same s3 bucket?
>>
>> HTH
>> Deepak
>>
>> On Tue, May 31, 2016 at 4:41 PM, Mayuresh Kunjir <mayuresh@cs.duke.edu>
>> wrote:
>>
>>>
>>>
>>> On Tue, May 31, 2016 at 5:29 AM, Steve Loughran <stevel@hortonworks.com>
>>> wrote:
>>>
>>>> which s3 endpoint?
>>>>
>>>>
>>> ​I have tried both s3.amazonaws.com and s3-external-1.amazonaws.com​.
>>>
>>>
>>>>
>>>>
>>>> On 29 May 2016, at 22:55, Mayuresh Kunjir <mayuresh@cs.duke.edu> wrote:
>>>>
>>>> I'm running into permission issues while accessing data in S3 bucket
>>>> stored using s3a file system from a local Spark cluster. Has anyone found
>>>> success with this?
>>>>
>>>> My setup is:
>>>> - Spark 1.6.1 compiled against Hadoop 2.7.2
>>>> - aws-java-sdk-1.7.4.jar and hadoop-aws-2.7.2.jar in the classpath
>>>> - Spark's Hadoop configuration is as follows:
>>>>
>>>> sc.hadoopConfiguration.set("fs.s3a.impl","org.apache.hadoop.fs.s3a.S3AFileSystem")
>>>> sc.hadoopConfiguration.set("fs.s3a.access.key", <access>)
>>>> sc.hadoopConfiguration.set("fs.s3a.secret.key", <secret>)
>>>> (The secret key does not have any '/' characters which is reported to
>>>> cause some issue by others)
>>>>
>>>> I have configured my S3 bucket to grant the necessary permissions. (
>>>> https://sparkour.urizone.net/recipes/configuring-s3/)
>>>>
>>>> What works: Listing, reading from, and writing to s3a using hadoop
>>>> command. e.g. hadoop dfs -ls s3a://<bucket name>/<file path>
>>>>
>>>> What doesn't work: Reading from s3a using Spark's textFile API. Each
>>>> task throws an exception which says *Forbidden Access(403)*.
>>>>
>>>> Some online documents suggest to use IAM roles to grant permissions for
>>>> an AWS cluster. But I would like a solution for my local standalone cluster.
>>>>
>>>> Any help would be appreciated.
>>>>
>>>> Regards,
>>>> ~Mayuresh
>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>> Thanks
>> Deepak
>> www.bigdatabig.com
>> www.keosha.net
>>
>
>

Mime
View raw message