spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dean Wampler <>
Subject Re: [Spark on Amazon EMR] : File does not exist: hdfs://ip-x-x-x-x:/.../spark-assembly-1.4.1-hadoop2.6.0-amzn-0.jar
Date Wed, 09 Sep 2015 21:28:44 GMT
If you log into the cluster, do you see the file if you type:

hdfs dfs
-ls hdfs://ipx-x-x-x:8020/user/hadoop/.sparkStaging/application_123344567_0018/spark-assembly-1.4.1-hadoop2.6.0-amzn-0.jar

(with the correct server address for "ipx-x-x-x"). If not, is the server
address correct and routable inside the cluster. Recall that EC2 instances
have both public and private host names & IP addresses.

Also, is the port number correct for HDFS in the cluster?


Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
<> (O'Reilly)
Typesafe <>
@deanwampler <>

On Wed, Sep 9, 2015 at 9:28 AM, shahab <> wrote:

> Hi,
> I am using Spark on Amazon EMR. So far I have not succeeded to submit the
> application successfully, not sure what's problem. In the log file I see
> the followings.
> File does not exist:
> hdfs://ipx-x-x-x:8020/user/hadoop/.sparkStaging/application_123344567_0018/spark-assembly-1.4.1-hadoop2.6.0-amzn-0.jar
> However, even putting spark-assembly-1.4.1-hadoop2.6.0-amzn-0.jar in the
> fat jar file didn't solve the problem. I am out of clue now.
> I want to submit a spark application, using aws web console, as a step. I
> submit the application as : spark-submit --deploy-mode cluster --class
> mypack.MyMainClass --master yarn-cluster s3://mybucket/MySparkApp.jar Is
> there any one who has similar problem with EMR?
> best,
> /Shahab

View raw message