spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthew Cheah <matthew.c.ch...@gmail.com>
Subject Re: "Too many open files" exception on reduceByKey
Date Tue, 11 Mar 2014 20:49:51 GMT
Sorry, I also have some follow-up questions.

"In general if a node in your cluster has C assigned cores and you run a
job with X reducers then Spark will open C*X files in parallel and start
writing."

Some questions came to mind just now:
1) It would be nice to have a brief overview as to what these files are
being used for?
2) Is this C*X files being opened on each machine? Also, is C the total
number of cores among all machines in the cluster?

Thanks,

-Matt Cheah


On Tue, Mar 11, 2014 at 4:35 PM, Matthew Cheah <matthew.c.cheah@gmail.com>wrote:

> Thanks. Just curious, is there a default number of reducers that are used?
>
> -Matt Cheah
>
>
> On Mon, Mar 10, 2014 at 7:22 PM, Patrick Wendell <pwendell@gmail.com>wrote:
>
>> Hey Matt,
>>
>> The best way is definitely just to increase the ulimit if possible,
>> this is sort of an assumption we make in Spark that clusters will be
>> able to move it around.
>>
>> You might be able to hack around this by decreasing the number of
>> reducers but this could have some performance implications for your
>> job.
>>
>> In general if a node in your cluster has C assigned cores and you run
>> a job with X reducers then Spark will open C*X files in parallel and
>> start writing. Shuffle consolidation will help decrease the total
>> number of files created but the number of file handles open at any
>> time doesn't change so it won't help the ulimit problem.
>>
>> This means you'll have to use fewer reducers (e.g. pass reduceByKey a
>> number of reducers) or use fewer cores on each machine.
>>
>> - Patrick
>>
>> On Mon, Mar 10, 2014 at 10:41 AM, Matthew Cheah
>> <matthew.c.cheah@gmail.com> wrote:
>> > Hi everyone,
>> >
>> > My team (cc'ed in this e-mail) and I are running a Spark reduceByKey
>> > operation on a cluster of 10 slaves where I don't have the privileges
>> to set
>> > "ulimit -n" to a higher number. I'm running on a cluster where "ulimit
>> -n"
>> > returns 1024 on each machine.
>> >
>> > When I attempt to run this job with the data originating from a text
>> file,
>> > stored in an HDFS cluster running on the same nodes as the Spark
>> cluster,
>> > the job crashes with the message, "Too many open files".
>> >
>> > My question is, why are so many files being created, and is there a way
>> to
>> > configure the Spark context to avoid spawning that many files? I am
>> already
>> > setting spark.shuffle.consolidateFiles to true.
>> >
>> > I want to repeat - I can't change the maximum number of open file
>> > descriptors on the machines. This cluster is not owned by me and the
>> system
>> > administrator is responding quite slowly.
>> >
>> > Thanks,
>> >
>> > -Matt Cheah
>>
>
>

Mime
View raw message