spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christophe Préaud <>
Subject Application failure in yarn-cluster mode
Date Fri, 10 Oct 2014 16:24:28 GMT

After updating from spark-1.0.0 to spark-1.1.0, my spark applications failed most of the time
(but not always) in yarn-cluster mode (but not in yarn-client mode).

Here is my configuration:

 *   spark-1.1.0
 *   hadoop-2.2.0

And the hadoop.tmp.dir definition in the hadoop core-site.xml file (each directory is on its
own partition):

After investigating, it turns out that the problem is when the executor fetches a jar file:
the jar is downloaded in a temporary file, always in /d1/yarn/local (see hadoop.tmp.dir definition
above), and then moved in one of the temporary directory defined in hadoop.tmp.dir:

 *   if it is the same than the temporary file (i.e. /d1/yarn/local), then the application
continues normally
 *   if it is another one (i.e. /d2/yarn/local, /d3/yarn/local,...), it fails with the following

14/10/10 14:33:51 ERROR executor.Executor: Exception in task 0.0 in stage 1.0 (TID 0) ./logReader-1.0.10.jar (Permission denied)
        at Method)
        at org.apache.spark.util.Utils$.fetchFile(Utils.scala:440)
        at org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$6.apply(Executor.scala:325)
        at org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$6.apply(Executor.scala:323)
        at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
        at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
        at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
        at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
        at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
        at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
        at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
        at org.apache.spark.executor.Executor$
        at java.util.concurrent.ThreadPoolExecutor.runWorker(
        at java.util.concurrent.ThreadPoolExecutor$

I have no idea why the move fails when the source and target files are not on the same partition,
for the moment I have worked around the problem with the attached patch (i.e. I ensure that
the temp file and the moved file are always on the same partition).

Any thought about this problem?


Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 8, rue du Sentier 75002 Paris
425 093 069 RCS Paris

Ce message et les pièces jointes sont confidentiels et établis à l'attention exclusive
de leurs destinataires. Si vous n'êtes pas le destinataire de ce message, merci de le détruire
et d'en avertir l'expéditeur.

View raw message