spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ashish Rangole <arang...@gmail.com>
Subject Re: SaveAsTextFile to S3 bucket
Date Tue, 27 Jan 2015 05:30:57 GMT
By default, the files will be created under the path provided as the
argument for saveAsTextFile. This argument is considered as a folder in the
bucket and actual files are created in it with the naming convention
part-nnnnn, where nnnnn is the number of output partition.

On Mon, Jan 26, 2015 at 9:15 PM, Nick Pentreath <nick.pentreath@gmail.com>
wrote:

> Your output folder specifies
>
> rdd.saveAsTextFile("s3n://nexgen-software/dev/output");
>
> So it will try to write to /dev/output which is as expected. If you create
> the directory /dev/output upfront in your bucket, and try to save it to
> that (empty) directory, what is the behaviour?
>
> On Tue, Jan 27, 2015 at 6:21 AM, Chen, Kevin <Kevin.Chen@neustar.biz>
> wrote:
>
>>  Does anyone know if I can save a RDD as a text file to a pre-created
>> directory in S3 bucket?
>>
>>  I have a directory created in S3 bucket: //nexgen-software/dev
>>
>>  When I tried to save a RDD as text file in this directory:
>> rdd.saveAsTextFile("s3n://nexgen-software/dev/output");
>>
>>
>>  I got following exception at runtime:
>>
>> Exception in thread "main" org.apache.hadoop.fs.s3.S3Exception:
>> org.jets3t.service.S3ServiceException: S3 HEAD request failed for '/dev' -
>> ResponseCode=403, ResponseMessage=Forbidden
>>
>>
>>  I have verified /dev has write permission. However, if I grant the
>> bucket //nexgen-software write permission, I don't get exception. But the
>> output is not created under dev. Rather, a different /dev/output directory
>> is created directory in the bucket (//nexgen-software). Is this how
>> saveAsTextFile behalves in S3? Is there anyway I can have output created
>> under a pre-defied directory.
>>
>>
>>  Thanks in advance.
>>
>>
>>
>>
>>
>

Mime
View raw message