I think the jar file has to be local. In HDFS is not supported yet in Spark.

See this answer:

http://stackoverflow.com/questions/28739729/spark-submit-not-working-when-application-jar-is-in-hdfs

> Date: Sun, 29 Mar 2015 22:34:46 -0700
> From: n.e.travers@gmail.com
> To: user@spark.apache.org
> Subject: java.io.FileNotFoundException when using HDFS in cluster mode
>
> Hi List,
>
> I'm following this example here
> <https://github.com/databricks/learning-spark/tree/master/mini-complete-example>
> with the following:
>
> $SPARK_HOME/bin/spark-submit \
> --deploy-mode cluster \
> --master spark://host.domain.ex:7077 \
> --class com.oreilly.learningsparkexamples.mini.scala.WordCount \
>
> hdfs://host.domain.ex/user/nickt/learning-spark-mini-example_2.10-0.0.1.jar
> \
> hdfs://host.domain.ex/user/nickt/linkage
> hdfs://host.domain.ex/user/nickt/wordcounts
>
> The jar is submitted fine and I can see it appear on the driver node (i.e.
> connecting to and reading from HDFS ok):
>
> -rw-r--r-- 1 nickt nickt 15K Mar 29 22:05
> learning-spark-mini-example_2.10-0.0.1.jar
> -rw-r--r-- 1 nickt nickt 9.2K Mar 29 22:05 stderr
> -rw-r--r-- 1 nickt nickt 0 Mar 29 22:05 stdout
>
> But it's failing due to a java.io.FileNotFoundException saying my input file
> is missing:
>
> Caused by: java.io.FileNotFoundException: Added file
> file:/home/nickt/spark-1.3.0/work/driver-20150329220503-0021/hdfs:/host.domain.ex/user/nickt/linkage
> does not exist.
>
> I'm using sc.addFile("hdfs://path/to/the_file.txt") to propagate to all the
> workers and sc.textFile(SparkFiles("the_file.txt")) to return the path to
> the file on each of the hosts.
>
> Has anyone come up against this before when reading from HDFS? No doubt I'm
> doing something wrong.
>
> Full trace below:
>
> Launch Command: "/usr/java/java8/bin/java" "-cp"
> ":/home/nickt/spark-1.3.0/conf:/home/nickt/spark-1.3.0/assembly/target/scala-2.10/spark-assembly-1.3.0-hadoop2.0.0-mr1-cdh4.6.0.jar"
> "-Dakka.loglevel=WARNING" "-Dspark.driver.supervise=false"
> "-Dspark.app.name=com.oreilly.learningsparkexamples.mini.scala.WordCount"
> "-Dspark.akka.askTimeout=10"
> "-Dspark.jars=hdfs://host.domain.ex/user/nickt/learning-spark-mini-example_2.10-0.0.1.jar"
> "-Dspark.master=spark://host.domain.ex:7077" "-Xms512M" "-Xmx512M"
> "org.apache.spark.deploy.worker.DriverWrapper"
> "akka.tcp://sparkWorker@host5.domain.ex:40830/user/Worker"
> "/home/nickt/spark-1.3.0/work/driver-20150329220503-0021/learning-spark-mini-example_2.10-0.0.1.jar"
> "com.oreilly.learningsparkexamples.mini.scala.WordCount"
> "hdfs://host.domain.ex/user/nickt/linkage"
> "hdfs://host.domain.ex/user/nickt/wordcounts"
> ========================================
>
> log4j:WARN No appenders could be found for logger
> (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
> log4j:WARN Please initialize the log4j system properly.
> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
> more info.
> Using Spark's default log4j profile:
> org/apache/spark/log4j-defaults.properties
> 15/03/29 22:05:05 INFO SecurityManager: Changing view acls to: nickt
> 15/03/29 22:05:05 INFO SecurityManager: Changing modify acls to: nickt
> 15/03/29 22:05:05 INFO SecurityManager: SecurityManager: authentication
> disabled; ui acls disabled; users with view permissions: Set(nickt); users
> with modify permissions: Set(nickt)
> 15/03/29 22:05:05 INFO Slf4jLogger: Slf4jLogger started
> 15/03/29 22:05:05 INFO Utils: Successfully started service 'Driver' on port
> 44201.
> 15/03/29 22:05:05 INFO WorkerWatcher: Connecting to worker
> akka.tcp://sparkWorker@host5.domain.ex:40830/user/Worker
> 15/03/29 22:05:05 INFO SparkContext: Running Spark version 1.3.0
> 15/03/29 22:05:05 INFO SecurityManager: Changing view acls to: nickt
> 15/03/29 22:05:05 INFO SecurityManager: Changing modify acls to: nickt
> 15/03/29 22:05:05 INFO SecurityManager: SecurityManager: authentication
> disabled; ui acls disabled; users with view permissions: Set(nickt); users
> with modify permissions: Set(nickt)
> 15/03/29 22:05:05 INFO Slf4jLogger: Slf4jLogger started
> 15/03/29 22:05:05 INFO Utils: Successfully started service 'sparkDriver' on
> port 33382.
> 15/03/29 22:05:05 INFO SparkEnv: Registering MapOutputTracker
> 15/03/29 22:05:05 INFO SparkEnv: Registering BlockManagerMaster
> 15/03/29 22:05:05 INFO DiskBlockManager: Created local directory at
> /tmp/spark-9c52eb1e-92b9-4e3f-b0e9-699a158f8e40/blockmgr-222a2522-a0fc-4535-a939-4c14d92dc666
> 15/03/29 22:05:05 INFO WorkerWatcher: Successfully connected to
> akka.tcp://sparkWorker@host5.domain.ex:40830/user/Worker
> 15/03/29 22:05:05 INFO MemoryStore: MemoryStore started with capacity 265.1
> MB
> 15/03/29 22:05:05 INFO HttpFileServer: HTTP File server directory is
> /tmp/spark-031afddd-2a75-4232-931a-89e502b0d722/httpd-7e22bb57-3cfe-4c89-aaec-4e6ca1a65f66
> 15/03/29 22:05:05 INFO HttpServer: Starting HTTP Server
> 15/03/29 22:05:05 INFO Server: jetty-8.y.z-SNAPSHOT
> 15/03/29 22:05:05 INFO AbstractConnector: Started
> SocketConnector@0.0.0.0:42484
> 15/03/29 22:05:05 INFO Utils: Successfully started service 'HTTP file
> server' on port 42484.
> 15/03/29 22:05:05 INFO SparkEnv: Registering OutputCommitCoordinator
> 15/03/29 22:05:06 INFO Server: jetty-8.y.z-SNAPSHOT
> 15/03/29 22:05:06 INFO AbstractConnector: Started
> SelectChannelConnector@0.0.0.0:4040
> 15/03/29 22:05:06 INFO Utils: Successfully started service 'SparkUI' on port
> 4040.
> 15/03/29 22:05:06 INFO SparkUI: Started SparkUI at
> http://host5.domain.ex:4040
> 15/03/29 22:05:06 ERROR SparkContext: Jar not found at
> target/scala-2.10/learning-spark-mini-example_2.10-0.0.1.jar
> 15/03/29 22:05:06 INFO AppClient$ClientActor: Connecting to master
> akka.tcp://sparkMaster@host.domain.ex:7077/user/Master...
> 15/03/29 22:05:06 INFO SparkDeploySchedulerBackend: Connected to Spark
> cluster with app ID app-20150329220506-0027
> 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor added:
> app-20150329220506-0027/0 on worker-20150329112422-host3.domain.ex-33765
> (host3.domain.ex:33765) with 64 cores
> 15/03/29 22:05:06 INFO SparkDeploySchedulerBackend: Granted executor ID
> app-20150329220506-0027/0 on hostPort host3.domain.ex:33765 with 64 cores,
> 512.0 MB RAM
> 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor added:
> app-20150329220506-0027/1 on worker-20150329112422-host6.domain.ex-35464
> (host6.domain.ex:35464) with 64 cores
> 15/03/29 22:05:06 INFO SparkDeploySchedulerBackend: Granted executor ID
> app-20150329220506-0027/1 on hostPort host6.domain.ex:35464 with 64 cores,
> 512.0 MB RAM
> 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor added:
> app-20150329220506-0027/2 on worker-20150329112422-host2.domain.ex-40914
> (host2.domain.ex:40914) with 64 cores
> 15/03/29 22:05:06 INFO SparkDeploySchedulerBackend: Granted executor ID
> app-20150329220506-0027/2 on hostPort host2.domain.ex:40914 with 64 cores,
> 512.0 MB RAM
> 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor added:
> app-20150329220506-0027/3 on worker-20150329112421-host4.domain.ex-35927
> (host4.domain.ex:35927) with 64 cores
> 15/03/29 22:05:06 INFO SparkDeploySchedulerBackend: Granted executor ID
> app-20150329220506-0027/3 on hostPort host4.domain.ex:35927 with 64 cores,
> 512.0 MB RAM
> 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor added:
> app-20150329220506-0027/4 on worker-20150329112422-host1.domain.ex-60546
> (host1.domain.ex:60546) with 64 cores
> 15/03/29 22:05:06 INFO SparkDeploySchedulerBackend: Granted executor ID
> app-20150329220506-0027/4 on hostPort host1.domain.ex:60546 with 64 cores,
> 512.0 MB RAM
> 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor added:
> app-20150329220506-0027/5 on worker-20150329112421-host.domain.ex-59485
> (host.domain.ex:59485) with 64 cores
> 15/03/29 22:05:06 INFO SparkDeploySchedulerBackend: Granted executor ID
> app-20150329220506-0027/5 on hostPort host.domain.ex:59485 with 64 cores,
> 512.0 MB RAM
> 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor added:
> app-20150329220506-0027/6 on worker-20150329112421-host5.domain.ex-40830
> (host5.domain.ex:40830) with 63 cores
> 15/03/29 22:05:06 INFO SparkDeploySchedulerBackend: Granted executor ID
> app-20150329220506-0027/6 on hostPort host5.domain.ex:40830 with 63 cores,
> 512.0 MB RAM
> 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated:
> app-20150329220506-0027/2 is now LOADING
> 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated:
> app-20150329220506-0027/0 is now LOADING
> 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated:
> app-20150329220506-0027/1 is now LOADING
> 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated:
> app-20150329220506-0027/4 is now LOADING
> 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated:
> app-20150329220506-0027/3 is now LOADING
> 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated:
> app-20150329220506-0027/5 is now LOADING
> 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated:
> app-20150329220506-0027/0 is now RUNNING
> 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated:
> app-20150329220506-0027/1 is now RUNNING
> 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated:
> app-20150329220506-0027/2 is now RUNNING
> 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated:
> app-20150329220506-0027/6 is now LOADING
> 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated:
> app-20150329220506-0027/3 is now RUNNING
> 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated:
> app-20150329220506-0027/4 is now RUNNING
> 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated:
> app-20150329220506-0027/5 is now RUNNING
> 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated:
> app-20150329220506-0027/6 is now RUNNING
> 15/03/29 22:05:06 INFO NettyBlockTransferService: Server created on 39447
> 15/03/29 22:05:06 INFO BlockManagerMaster: Trying to register BlockManager
> 15/03/29 22:05:06 INFO BlockManagerMasterActor: Registering block manager
> host5.domain.ex:39447 with 265.1 MB RAM, BlockManagerId(<driver>,
> host5.domain.ex, 39447)
> 15/03/29 22:05:06 INFO BlockManagerMaster: Registered BlockManager
> 15/03/29 22:05:06 INFO SparkDeploySchedulerBackend: SchedulerBackend is
> ready for scheduling beginning after reached minRegisteredResourcesRatio:
> 0.0
> Exception in thread "main" java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at
> org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:59)
> at
> org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
> Caused by: java.io.FileNotFoundException: Added file
> file:/home/nickt/spark-1.3.0/work/driver-20150329220503-0021/hdfs:/host.domain.ex/user/nickt/linkage
> does not exist.
> at org.apache.spark.SparkContext.addFile(SparkContext.scala:1089)
> at org.apache.spark.SparkContext.addFile(SparkContext.scala:1065)
> at
> com.oreilly.learningsparkexamples.mini.scala.WordCount$.main(WordCount.scala:21)
> at
> com.oreilly.learningsparkexamples.mini.scala.WordCount.main(WordCount.scala)
> ... 6 more
>
>
>
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/java-io-FileNotFoundException-when-using-HDFS-in-cluster-mode-tp22287.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>