spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Ash <>
Subject Re: Shuffle files
Date Fri, 26 Sep 2014 03:09:49 GMT
Hi SK,

For the problem with lots of shuffle files and the "too many open files"
exception there are a couple options:

1. The linux kernel has a limit on the number of open files at once.  This
is set with ulimit -n, and can be set permanently in /etc/sysctl.conf or
/etc/sysctl.d/.  Try increasing this to a large value, at the bare minimum
the square of your partition count.
2. Try using shuffle consolidation -- spark.shuffle.consolidateFiles=true This
option writes fewer files to disk so shouldn't hit limits nearly as much
3. Try using the sort-based shuffle by setting spark.shuffle.manager=SORT.
You should likely hold off on this until is fixed, hopefully in

Hope that helps!

On Thu, Sep 25, 2014 at 4:20 PM, SK <> wrote:

> Hi,
> I am using Spark 1.1.0 on a cluster. My job takes as input 30 files in a
> directory (I am using  sc.textfile("dir/*") ) to read in the files.  I am
> getting the following warning:
> WARN TaskSetManager: Lost task 99.0 in stage 1.0 (TID 99,
> /tmp/spark-local-20140925215712-0319/12/shuffle_0_99_93138 (Too many open
> files)
> basically I think a lot of shuffle files are being created.
> 1) The tasks eventually fail and the job just hangs (after taking very
> long,
> more than an hour).  If I read these 30 files in a for loop, the same job
> completes in a few minutes. However, I need to specify the files names,
> which is not convenient. I am assuming that sc.textfile("dir/*") creates a
> large RDD for all the 30 files. Is there a way to make the operation on
> this
> large RDD efficient so as to avoid creating too many shuffle files?
> 2) Also, I am finding that all the shuffle files for my other completed
> jobs
> are not being automatically deleted even after days. I thought that
> sc.stop() clears the intermediate files.  Is there some way to
> programmatically delete these temp shuffle files upon job completion?
> thanks
> --
> View this message in context:
> Sent from the Apache Spark User List mailing list archive at
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

View raw message