spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shao, Saisai" <saisai.s...@intel.com>
Subject RE: Spilled shuffle files not being cleared
Date Fri, 13 Jun 2014 02:32:40 GMT
Hi Michael,

I think you can set up spark.cleaner.ttl=xxx to enable time-based metadata cleaner, which
will clean old un-used shuffle data when it is timeout.

For Spark 1.0 another way is to clean shuffle data using weak reference (reference tracking
based, configuration is spark.cleaner.referenceTracking), and it is enabled by default.

Thanks
Saisai

From: Michael Chang [mailto:mike@tellapart.com]
Sent: Friday, June 13, 2014 10:15 AM
To: user@spark.apache.org
Subject: Re: Spilled shuffle files not being cleared

Bump

On Mon, Jun 9, 2014 at 3:22 PM, Michael Chang <mike@tellapart.com<mailto:mike@tellapart.com>>
wrote:
Hi all,

I'm seeing exceptions that look like the below in Spark 0.9.1.  It looks like I'm running
out of inodes on my machines (I have around 300k each in a 12 machine cluster).  I took a
quick look and I'm seeing some shuffle spill files that are around even around 12 minutes
after they are created.  Can someone help me understand when these shuffle spill files should
be cleaned up (Is it as soon as they are used?)

Thanks,
Michael


java.io.FileNotFoundException: /mnt/var/hadoop/1/yarn/local/usercache/ubuntu/appcache/application_1399886706975_13107/spark-local-20140609210947-19e1/1c/shuffle_41637_3_0
(No space left on device)
        at java.io.FileOutputStream.open(Native Method)
        at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
        at org.apache.spark.storage.DiskBlockObjectWriter.open(BlockObjectWriter.scala:118)
        at org.apache.spark.storage.DiskBlockObjectWriter.write(BlockObjectWriter.scala:179)
        at org.apache.spark.scheduler.ShuffleMapTask$$anonfun$runTask$1.apply(ShuffleMapTask.scala:164)
        at org.apache.spark.scheduler.ShuffleMapTask$$anonfun$runTask$1.apply(ShuffleMapTask.scala:161)
        at scala.collection.Iterator$class.foreach(Iterator.scala:727)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:161)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:102)
        at org.apache.spark.scheduler.Task.run(Task.scala:53)
        at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:211)
        at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:42)
        at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:41)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
        at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:41)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:176)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)
14/06/09 22:07:36 WARN TaskSetManager: Lost TID 667432 (task 86909.0:7)
14/06/09 22:07:36 WARN TaskSetManager: Loss was due to java.io.FileNotFoundException

Mime
View raw message