spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chen, Kevin" <>
Subject Re: SaveAsTextFile to S3 bucket
Date Tue, 27 Jan 2015 05:33:38 GMT
When spark saves rdd to a text file, the directory must not exist upfront. It will create a
directory and write the data to part-0000 under that directory. In my use case, I create a
directory dev in the bucket ://nexgen-software/dev . I expect it creates output direct under
dev and a part-0000 under output. But it gave me exception as I only give write permission
to dev not the bucket. If I open up write permission to bucket, it worked. But it did not
create output directory under dev, it rather creates another dev/output directory under bucket.
I just want to know if it is possible to have output directory created under dev directory
I created upfront.

From: Nick Pentreath <<>>
Date: Monday, January 26, 2015 9:15 PM
To: "<>" <<>>
Subject: Re: SaveAsTextFile to S3 bucket

Your output folder specifies


So it will try to write to /dev/output which is as expected. If you create the directory /dev/output
upfront in your bucket, and try to save it to that (empty) directory, what is the behaviour?

On Tue, Jan 27, 2015 at 6:21 AM, Chen, Kevin <<>>
Does anyone know if I can save a RDD as a text file to a pre-created directory in S3 bucket?

I have a directory created in S3 bucket: //nexgen-software/dev

When I tried to save a RDD as text file in this directory:

I got following exception at runtime:

Exception in thread "main" org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException:
S3 HEAD request failed for '/dev' - ResponseCode=403, ResponseMessage=Forbidden

I have verified /dev has write permission. However, if I grant the bucket //nexgen-software
write permission, I don't get exception. But the output is not created under dev. Rather,
a different /dev/output directory is created directory in the bucket (//nexgen-software).
Is this how saveAsTextFile behalves in S3? Is there anyway I can have output created under
a pre-defied directory.

Thanks in advance.

View raw message