spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ashic Mahtab <as...@live.com>
Subject Cluster mode deployment from jar in S3
Date Fri, 01 Jul 2016 16:45:12 GMT
Hello,I've got a Spark stand-alone cluster using EC2 instances. I can submit jobs using "--deploy-mode
client", however using "--deploy-mode cluster" is proving to be a challenge. I've tries this:
spark-submit --class foo --master spark:://master-ip:7077 --deploy-mode cluster s3://bucket/dir/foo.jar
When I do this, I get:

16/07/01 16:23:16 ERROR ClientEndpoint: Exception from cluster was: java.lang.IllegalArgumentException:
AWS Access Key ID and Secret Access Key must be specified as the username or password (respectively)
of a s3 URL, or by setting the fs.s3.awsAccessKeyId or fs.s3.awsSecretAccessKey properties
(respectively).java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key
must be specified as the username or password (respectively) of a s3 URL, or by setting the
fs.s3.awsAccessKeyId or fs.s3.awsSecretAccessKey properties (respectively).        at org.apache.hadoop.fs.s3.S3Credentials.initialize(S3Credentials.java:66)
       at org.apache.hadoop.fs.s3.Jets3tFileSystemStore.initialize(Jets3tFileSystemStore.java:82)
       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
       at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
       at java.lang.reflect.Method.invoke(Method.java:498)        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85)
       at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62)

Now I'm not using any S3 or hadoop stuff within my code (it's just an sc.parallelize(1 to
100)). So, I imagine it's the driver trying to fetch the jar. I haven't set the AWS Access
Key Id and Secret as mentioned, but the role the machine's are in allow them to copy the jar.
In other words, this works:
aws s3 cp s3://bucket/dir/foo.jar /tmp/foo.jar
I'm using Spark 1.6.2, and can't really think of what I can do so that I can submit the jar
from s3 using cluster deploy mode. I've also tried simply downloading the jar onto a node,
and spark-submitting that... that works in client mode, but I get a not found error when using
cluster mode.
Any help will be appreciated.
Thanks,Ashic. 		 	   		  
Mime
View raw message