spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Ash <and...@andrewash.com>
Subject Re: shuffle memory requirements
Date Tue, 30 Sep 2014 21:04:02 GMT
Hi Maddenpj,

Right now the best estimate I've heard for the open file limit is that
you'll need the square of the largest partition count in your dataset.

I filed a ticket to log the ulimit value when it's too low at
https://issues.apache.org/jira/browse/SPARK-3750

On Mon, Sep 29, 2014 at 6:20 PM, maddenpj <maddenpj@gmail.com> wrote:

> Hey Ameet,
>
> Thanks for the info, I'm running into the same issue myself and my last
> attempt crashed and my ulimit was 16834. I'm going to up it and try again,
> but yea I would like to know the best practice for computing this. Can you
> talk about the worker nodes, what are their specs? At least 45 gigs of
> memory and 6 cores?
>
> Also I left my worker at the default memory size (512m I think) and gave
> all
> of the memory to the executor. It was my understanding that the worker just
> spawns the executor but all the work is done in the executor. What was your
> reasoning for using 24G on the worker?
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/shuffle-memory-requirements-tp4048p15375.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

Mime
View raw message