spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anders Arpteg <arp...@spotify.com>
Subject Missing shuffle files
Date Sat, 21 Feb 2015 16:20:10 GMT
For large jobs, the following error message is shown that seems to indicate
that shuffle files for some reason are missing. It's a rather large job
with many partitions. If the data size is reduced, the problem disappears.
I'm running a build from Spark master post 1.2 (build at 2015-01-16) and
running on Yarn 2.2. Any idea of how to resolve this problem?

User class threw exception: Job aborted due to stage failure: Task 450 in
stage 450.1 failed 4 times, most recent failure: Lost task 450.3 in stage
450.1 (TID 167370, lon4-hadoopslave-b77.lon4.spotify.net):
java.io.FileNotFoundException:
/disk/hd06/yarn/local/usercache/arpteg/appcache/application_1424333823218_21217/spark-local-20150221154811-998c/03/rdd_675_450
(No such file or directory)
 at java.io.FileOutputStream.open(Native Method)
 at java.io.FileOutputStream.(FileOutputStream.java:221)
 at java.io.FileOutputStream.(FileOutputStream.java:171)
 at org.apache.spark.storage.DiskStore.putIterator(DiskStore.scala:76)
 at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:786)
 at
org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:637)
 at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:149)
 at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:74)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:264)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:231)
 at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:64)
 at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:192)
 at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

 at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

 at java.lang.Thread.run(Thread.java:745)

TIA,
Anders

Mime
View raw message