spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jcgarciam <jcgarc...@gmail.com>
Subject Restarting a failed Spark streaming job running on top of a yarn cluster
Date Wed, 03 Oct 2018 12:21:28 GMT
Hi Folks,

We have few spark job streaming jobs running on a yarn cluster, and from
time to time a job need to be restarted (it was killed due to external
reason or others).

Once we submit the new job we are face with the following exception:
 ERROR spark.SparkContext: Failed to add
/mnt/data1/yarn/nm/usercache/spark/appcache/*application_1537885048149_15382*/container_e82_1537885048149_15382_01_000001/__app__.jar
to Spark environment
java.io.FileNotFoundException: Jar
/mnt/data1/yarn/nm/usercache/spark/appcache/application_1537885048149_15382/container_e82_1537885048149_15382_01_000001/__app__.jar
not found
	at org.apache.spark.SparkContext.addJarFile$1(SparkContext.scala:1807)
	at org.apache.spark.SparkContext.addJar(SparkContext.scala:1835)
	at org.apache.spark.SparkContext$$anonfun$12.apply(SparkContext.scala:457)

Of course we know that *application_1537885048149_15382* correspond to the
previous job that was killed, and that our Yarn is cleaning up the usercache
directory very often to avoid choking the filesystem with so many unused
file.

However what can you guys recommend for long running jobs that have to be
restarted but the previous context is not available due to the cleanup?


Hope is clear what i meant, if you need more information just ask.

Thanks

JC




--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message