spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Ash <and...@andrewash.com>
Subject Re: Larger heap leads to perf degradation due to GC
Date Sun, 05 Oct 2014 23:20:46 GMT
Hi Mingyu,

Maybe we should be limiting our heaps to 32GB max and running multiple
workers per machine to avoid large GC issues.

For a 128GB memory, 32 core machine, this could look like:

SPARK_WORKER_INSTANCES=4
SPARK_WORKER_MEMORY=32
SPARK_WORKER_CORES=8

Are people running with large (32GB+) executor heaps in production?  I'd be
curious to hear if so.

Cheers!
Andrew

On Thu, Oct 2, 2014 at 1:30 PM, Mingyu Kim <mkim@palantir.com> wrote:

> This issue definitely needs more investigation, but I just wanted to
> quickly check if anyone has run into this problem or has general guidance
> around it. We’ve seen a performance degradation with a large heap on a
> simple map task (I.e. No shuffle). We’ve seen the slowness starting around
> from 50GB heap. (I.e. spark.executor.memoty=50g) And, when we checked the
> CPU usage, there were just a lot of GCs going on.
>
> Has anyone seen a similar problem?
>
> Thanks,
> Mingyu
>

Mime
View raw message