spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eugen Cepoi <>
Subject Re: an OOM while persist as DISK_ONLY
Date Fri, 04 Mar 2016 00:11:54 GMT
You can limit the amount of memory spark will use for shuffle even in 1.6.
You can do that by tweaking the spark.memory.fraction and the For example if you want to have no shuffle cache at
all you can set the storage.fraction to 1 or something close, to let a
small place for the shuffle cache. And then use the rest for storage, and
if you don't persist/broadcast data then you can reduce the whole

Though not sure how good it is to tweak those values, as it assumes spark
is mostly using it for caching stuff... I have used similar tweaks in spark
1.4 and tried it on spark 1.6 and that solved some problems...


2016-03-03 15:59 GMT-08:00 Andy Dang <>:

> Spark shuffling algorithm is very aggressive in storing everything in RAM,
> and the behavior is worse in 1.6 with the UnifiedMemoryManagement. At least
> in previous versions you can limit the shuffler memory, but Spark 1.6 will
> use as much memory as it can get. What I see is that Spark seems to
> underestimate the amount of memory that objects take up, and thus doesn't
> spill frequently enough. There's a dirty work around (legacy mode) but the
> common advice is to increase your parallelism (and keep in mind that
> operations such as join have implicit parallelism, so you'll want to be
> explicit about it).
> -------
> Regards,
> Andy
> On Mon, Feb 22, 2016 at 2:12 PM, Alex Dzhagriev <> wrote:
>> Hello all,
>> I'm using spark 1.6 and trying to cache a dataset which is 1.5 TB, I have
>> only ~800GB RAM  in total, so I am choosing the DISK_ONLY storage level.
>> Unfortunately, I'm getting out of the overhead memory limit:
>> Container killed by YARN for exceeding memory limits. 27.0 GB of 27 GB physical memory
used. Consider boosting spark.yarn.executor.memoryOverhead.
>> I'm giving 6GB overhead memory and using 10 cores per executor.
>> Apparently, that's not enough. Without persisting the data and later
>> computing the dataset (twice in my case) the job works fine. Can anyone,
>> please, explain what is the overhead which consumes that much memory during
>> persist to the disk and how can I estimate what extra memory should I give
>> to the executors in order to make it not fail?
>> Thanks, Alex.

View raw message