spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lohith Samaga M <Lohith.Sam...@mphasis.com>
Subject RE: Cluster mode deployment from jar in S3
Date Mon, 04 Jul 2016 09:50:50 GMT
Hi,
                The aws CLI already has your access key aid and secret access key when you
initially configured it.
                Is your s3 bucket without any access restrictions?


Best regards / Mit freundlichen Grüßen / Sincères salutations
M. Lohith Samaga


From: Ashic Mahtab [mailto:ashic@live.com]
Sent: Monday, July 04, 2016 15.06
To: Apache Spark
Subject: RE: Cluster mode deployment from jar in S3

Sorry to do this...but... *bump*

________________________________
From: ashic@live.com<mailto:ashic@live.com>
To: user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Cluster mode deployment from jar in S3
Date: Fri, 1 Jul 2016 17:45:12 +0100
Hello,
I've got a Spark stand-alone cluster using EC2 instances. I can submit jobs using "--deploy-mode
client", however using "--deploy-mode cluster" is proving to be a challenge. I've tries this:

spark-submit --class foo --master spark:://master-ip:7077 --deploy-mode cluster s3://bucket/dir/foo.jar

When I do this, I get:
16/07/01 16:23:16 ERROR ClientEndpoint: Exception from cluster was: java.lang.IllegalArgumentException:
AWS Access Key ID and Secret Access Key must be specified as the username or password (respectively)
of a s3 URL, or by setting the fs.s3.awsAccessKeyId or fs.s3.awsSecretAccessKey properties
(respectively).
java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key must be specified
as the username or password (respectively) of a s3 URL, or by setting the fs.s3.awsAccessKeyId
or fs.s3.awsSecretAccessKey properties (respectively).
        at org.apache.hadoop.fs.s3.S3Credentials.initialize(S3Credentials.java:66)
        at org.apache.hadoop.fs.s3.Jets3tFileSystemStore.initialize(Jets3tFileSystemStore.java:82)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62)


Now I'm not using any S3 or hadoop stuff within my code (it's just an sc.parallelize(1 to
100)). So, I imagine it's the driver trying to fetch the jar. I haven't set the AWS Access
Key Id and Secret as mentioned, but the role the machine's are in allow them to copy the jar.
In other words, this works:

aws s3 cp s3://bucket/dir/foo.jar /tmp/foo.jar

I'm using Spark 1.6.2, and can't really think of what I can do so that I can submit the jar
from s3 using cluster deploy mode. I've also tried simply downloading the jar onto a node,
and spark-submitting that... that works in client mode, but I get a not found error when using
cluster mode.

Any help will be appreciated.

Thanks,
Ashic.
Information transmitted by this e-mail is proprietary to Mphasis, its associated companies
and/ or its customers and is intended 
for use only by the individual or entity to which it is addressed, and may contain information
that is privileged, confidential or 
exempt from disclosure under applicable law. If you are not the intended recipient or it appears
that this mail has been forwarded 
to you without proper authority, you are notified that any use or dissemination of this information
in any manner is strictly 
prohibited. In such cases, please notify us immediately at mailmaster@mphasis.com and delete
this mail from your records.

Mime
View raw message