spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aseem Bansal (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-17307) Document what all access is needed on S3 bucket when trying to save a model
Date Thu, 01 Sep 2016 08:48:22 GMT

    [ https://issues.apache.org/jira/browse/SPARK-17307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15454791#comment-15454791
] 

Aseem Bansal commented on SPARK-17307:
--------------------------------------

I would add that bit of information at http://spark.apache.org/docs/latest/api/java/org/apache/spark/ml/util/MLWritable.html#save(java.lang.String)

Something like it needs complete read write access when using with S3 should be enough.

> Document what all access is needed on S3 bucket when trying to save a model
> ---------------------------------------------------------------------------
>
>                 Key: SPARK-17307
>                 URL: https://issues.apache.org/jira/browse/SPARK-17307
>             Project: Spark
>          Issue Type: Documentation
>            Reporter: Aseem Bansal
>            Priority: Minor
>
> I faced this lack of documentation when I was trying to save a model to S3. Initially
I thought it should be only write. Then I found it also needs delete to delete temporary files.
Now I requested access for delete and tried again and I am get the error
> Exception in thread "main" org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException:
S3 PUT failed for '/dev-qa_%24folder%24' XML Error Message
> To reproduce this error the below can be used
> {code}
> SparkSession sparkSession = SparkSession
>                 .builder()
>                 .appName("my app")
>                 .master("local") 
>                 .getOrCreate();
>         JavaSparkContext jsc = new JavaSparkContext(sparkSession.sparkContext());
> jsc.hadoopConfiguration().set("fs.s3n.awsAccessKeyId", <ACCESS_KEY>);
>         jsc.hadoopConfiguration().set("fs.s3n.awsSecretAccessKey", <SECRET ACCESS
KEY>);
> //Create a Pipelinemode
>         pipelineModel.write().overwrite().save("s3n://<BUCKET>/dev-qa/modelTest");
> {code}
> This back and forth could be avoided if it was clearly mentioned what all access spark
needs to write to S3. Also would be great if why all of the access is needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message