spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthew Cheah <matthew.c.ch...@gmail.com>
Subject "Too many open files" exception on reduceByKey
Date Mon, 10 Mar 2014 17:41:02 GMT
Hi everyone,

My team (cc'ed in this e-mail) and I are running a Spark reduceByKey
operation on a cluster of 10 slaves where I don't have the privileges to
set "ulimit -n" to a higher number. I'm running on a cluster where "ulimit
-n" returns 1024 on each machine.

When I attempt to run this job with the data originating from a text file,
stored in an HDFS cluster running on the same nodes as the Spark cluster,
the job crashes with the message, "Too many open files".

My question is, why are so many files being created, and is there a way to
configure the Spark context to avoid spawning that many files? I am already
setting spark.shuffle.consolidateFiles to true.

I want to repeat - I can't change the maximum number of open file
descriptors on the machines. This cluster is not owned by me and the system
administrator is responding quite slowly.

Thanks,

-Matt Cheah

Mime
View raw message