spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chen, Kevin" <Kevin.C...@neustar.biz>
Subject Re: SaveAsTextFile to S3 bucket
Date Tue, 27 Jan 2015 05:33:38 GMT
When spark saves rdd to a text file, the directory must not exist upfront. It will create a
directory and write the data to part-0000 under that directory. In my use case, I create a
directory dev in the bucket ://nexgen-software/dev . I expect it creates output direct under
dev and a part-0000 under output. But it gave me exception as I only give write permission
to dev not the bucket. If I open up write permission to bucket, it worked. But it did not
create output directory under dev, it rather creates another dev/output directory under bucket.
I just want to know if it is possible to have output directory created under dev directory
I created upfront.

From: Nick Pentreath <nick.pentreath@gmail.com<mailto:nick.pentreath@gmail.com>>
Date: Monday, January 26, 2015 9:15 PM
To: "user@spark.apache.org<mailto:user@spark.apache.org>" <user@spark.apache.org<mailto:user@spark.apache.org>>
Subject: Re: SaveAsTextFile to S3 bucket

Your output folder specifies

rdd.saveAsTextFile("s3n://nexgen-software/dev/output");

So it will try to write to /dev/output which is as expected. If you create the directory /dev/output
upfront in your bucket, and try to save it to that (empty) directory, what is the behaviour?

On Tue, Jan 27, 2015 at 6:21 AM, Chen, Kevin <Kevin.Chen@neustar.biz<mailto:Kevin.Chen@neustar.biz>>
wrote:
Does anyone know if I can save a RDD as a text file to a pre-created directory in S3 bucket?

I have a directory created in S3 bucket: //nexgen-software/dev

When I tried to save a RDD as text file in this directory:
rdd.saveAsTextFile("s3n://nexgen-software/dev/output");


I got following exception at runtime:

Exception in thread "main" org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException:
S3 HEAD request failed for '/dev' - ResponseCode=403, ResponseMessage=Forbidden


I have verified /dev has write permission. However, if I grant the bucket //nexgen-software
write permission, I don't get exception. But the output is not created under dev. Rather,
a different /dev/output directory is created directory in the bucket (//nexgen-software).
Is this how saveAsTextFile behalves in S3? Is there anyway I can have output created under
a pre-defied directory.


Thanks in advance.





Mime
View raw message