spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Petar Zecevic <petar.zece...@gmail.com>
Subject Re: Missing shuffle files
Date Sat, 21 Feb 2015 19:58:50 GMT

Could you try to turn on the external shuffle service?

spark.shuffle.service.enable= true


On 21.2.2015. 17:50, Corey Nolet wrote:
> I'm experiencing the same issue. Upon closer inspection I'm noticing 
> that executors are being lost as well. Thing is, I can't figure out 
> how they are dying. I'm using MEMORY_AND_DISK_SER and i've got over 
> 1.3TB of memory allocated for the application. I was thinking perhaps 
> it was possible that a single executor was getting a single or a 
> couple large partitions but shouldn't the disk persistence kick in at 
> that point?
>
> On Sat, Feb 21, 2015 at 11:20 AM, Anders Arpteg <arpteg@spotify.com 
> <mailto:arpteg@spotify.com>> wrote:
>
>     For large jobs, the following error message is shown that seems to
>     indicate that shuffle files for some reason are missing. It's a
>     rather large job with many partitions. If the data size is
>     reduced, the problem disappears. I'm running a build from Spark
>     master post 1.2 (build at 2015-01-16) and running on Yarn 2.2. Any
>     idea of how to resolve this problem?
>
>     User class threw exception: Job aborted due to stage failure: Task
>     450 in stage 450.1 failed 4 times, most recent failure: Lost task
>     450.3 in stage 450.1 (TID 167370,
>     lon4-hadoopslave-b77.lon4.spotify.net
>     <http://lon4-hadoopslave-b77.lon4.spotify.net>):
>     java.io.FileNotFoundException:
>     /disk/hd06/yarn/local/usercache/arpteg/appcache/application_1424333823218_21217/spark-local-20150221154811-998c/03/rdd_675_450
>     (No such file or directory)
>     at java.io.FileOutputStream.open(Native Method)
>     at java.io.FileOutputStream.(FileOutputStream.java:221)
>     at java.io.FileOutputStream.(FileOutputStream.java:171)
>     at org.apache.spark.storage.DiskStore.putIterator(DiskStore.scala:76)
>     at
>     org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:786)
>     at
>     org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:637)
>
>     at
>     org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:149)
>
>     at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:74)
>     at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>     at
>     org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>
>     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:264)
>     at org.apache.spark.rdd.RDD.iterator(RDD.scala:231)
>     at
>     org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>
>     at
>     org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>
>     at org.apache.spark.scheduler.Task.run(Task.scala:64)
>     at
>     org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:192)
>     at
>     java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>
>     at
>     java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>
>     at java.lang.Thread.run(Thread.java:745)
>
>     TIA,
>     Anders
>
>


Mime
View raw message