spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Ash <>
Subject Re: shuffle memory requirements
Date Tue, 30 Sep 2014 21:04:02 GMT
Hi Maddenpj,

Right now the best estimate I've heard for the open file limit is that
you'll need the square of the largest partition count in your dataset.

I filed a ticket to log the ulimit value when it's too low at

On Mon, Sep 29, 2014 at 6:20 PM, maddenpj <> wrote:

> Hey Ameet,
> Thanks for the info, I'm running into the same issue myself and my last
> attempt crashed and my ulimit was 16834. I'm going to up it and try again,
> but yea I would like to know the best practice for computing this. Can you
> talk about the worker nodes, what are their specs? At least 45 gigs of
> memory and 6 cores?
> Also I left my worker at the default memory size (512m I think) and gave
> all
> of the memory to the executor. It was my understanding that the worker just
> spawns the executor but all the work is done in the executor. What was your
> reasoning for using 24G on the worker?
> --
> View this message in context:
> Sent from the Apache Spark User List mailing list archive at
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

View raw message