S3 does not have the concept "directory". An S3 bucket only holds files (objects). The hadoop filesystem is mapped onto a bucket and use Hadoop-specific (or rather "s3tool"-specific: s3n uses the jets3t tool) conventions(hacks) to fake directories such as a ending with a slash ("filename/") and with s3n by "filename_$folder$" (these are leaky abstractions, google that if you ever have some spare time :p). S3 simply doesn't (and shouldn't) know about these conventions. Again, a bucket just holds a shitload of files. This might seem inconvenient but directories are really bad idea for scalable storage. However, setting "folder-like" permissions can be done through IAM: http://docs.aws.amazon.com/AmazonS3/latest/dev/example-policies-s3.html#iam-policy-ex1

Summarizing: by setting permissions on /dev you set permissions on that object. It has no effect on the file "/dev/output" which is, as far as S3 cares, another object that happens to share part of the objectname with /dev.

Thomas Demoor
skype: demoor.thomas
mobile: +32 497883833

On Tue, Jan 27, 2015 at 6:33 AM, Chen, Kevin <Kevin.Chen@neustar.biz> wrote:
When spark saves rdd to a text file, the directory must not exist upfront. It will create a directory and write the data to part-0000 under that directory. In my use case, I create a directory dev in the bucket ://nexgen-software/dev . I expect it creates output direct under dev and a part-0000 under output. But it gave me exception as I only give write permission to dev not the bucket. If I open up write permission to bucket, it worked. But it did not create output directory under dev, it rather creates another dev/output directory under bucket. I just want to know if it is possible to have output directory created under dev directory I created upfront.

From: Nick Pentreath <nick.pentreath@gmail.com>
Date: Monday, January 26, 2015 9:15 PM
To: "user@spark.apache.org" <user@spark.apache.org>
Subject: Re: SaveAsTextFile to S3 bucket

Your output folder specifies 

rdd.saveAsTextFile("s3n://nexgen-software/dev/output");

So it will try to write to /dev/output which is as expected. If you create the directory /dev/output upfront in your bucket, and try to save it to that (empty) directory, what is the behaviour?

On Tue, Jan 27, 2015 at 6:21 AM, Chen, Kevin <Kevin.Chen@neustar.biz> wrote:
Does anyone know if I can save a RDD as a text file to a pre-created directory in S3 bucket?

I have a directory created in S3 bucket: //nexgen-software/dev

When I tried to save a RDD as text file in this directory: 
rdd.saveAsTextFile("s3n://nexgen-software/dev/output");


I got following exception at runtime:

Exception in thread "main" org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: S3 HEAD request failed for '/dev' - ResponseCode=403, ResponseMessage=Forbidden


I have verified /dev has write permission. However, if I grant the bucket //nexgen-software write permission, I don't get exception. But the output is not created under dev. Rather, a different /dev/output directory is created directory in the bucket (//nexgen-software). Is this how saveAsTextFile behalves in S3? Is there anyway I can have output created under a pre-defied directory.


Thanks in advance.