spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eugen Cepoi <cepoi.eu...@gmail.com>
Subject Re: an OOM while persist as DISK_ONLY
Date Fri, 04 Mar 2016 00:45:01 GMT
We are in the process of upgrading to spark 1.6 from 1.4, and had a hard
time getting some of our more memory/join intensive jobs to work (rdd
caching + a lot of shuffling). Most of the time they were getting killed by
yarn.

Increasing the overhead was of course an option but the increase to make
the job pass was way higher than the overhead we had for spark 1.4, which
is way too much to be OK.

Playing with the configs above reduced the GC time but the problem still
persisted.

In the end it turned out we were hitting this issue
https://issues.apache.org/jira/browse/SPARK-12961.
What ended up working was to override the snappy version that comes with
EMR + disabling off-heap memory.

We still need to test the upgrade against our spark streaming jobs...
hopefully this issue https://issues.apache.org/jira/browse/SPARK-13288 is
also due to snappy...

Cheers,
Eugen


2016-03-03 16:14 GMT-08:00 Ted Yu <yuzhihong@gmail.com>:

> bq. that solved some problems
>
> Is there any problem that was not solved by the tweak ?
>
> Thanks
>
> On Thu, Mar 3, 2016 at 4:11 PM, Eugen Cepoi <cepoi.eugen@gmail.com> wrote:
>
>> You can limit the amount of memory spark will use for shuffle even in 1.6.
>> You can do that by tweaking the spark.memory.fraction and the
>> spark.storage.fraction. For example if you want to have no shuffle cache at
>> all you can set the storage.fraction to 1 or something close, to let a
>> small place for the shuffle cache. And then use the rest for storage, and
>> if you don't persist/broadcast data then you can reduce the whole
>> memory.fraction.
>>
>> Though not sure how good it is to tweak those values, as it assumes spark
>> is mostly using it for caching stuff... I have used similar tweaks in spark
>> 1.4 and tried it on spark 1.6 and that solved some problems...
>>
>> Eugen
>>
>> 2016-03-03 15:59 GMT-08:00 Andy Dang <namd88@gmail.com>:
>>
>>> Spark shuffling algorithm is very aggressive in storing everything in
>>> RAM, and the behavior is worse in 1.6 with the UnifiedMemoryManagement. At
>>> least in previous versions you can limit the shuffler memory, but Spark 1.6
>>> will use as much memory as it can get. What I see is that Spark seems to
>>> underestimate the amount of memory that objects take up, and thus doesn't
>>> spill frequently enough. There's a dirty work around (legacy mode) but the
>>> common advice is to increase your parallelism (and keep in mind that
>>> operations such as join have implicit parallelism, so you'll want to be
>>> explicit about it).
>>>
>>> -------
>>> Regards,
>>> Andy
>>>
>>> On Mon, Feb 22, 2016 at 2:12 PM, Alex Dzhagriev <dzhagr@gmail.com>
>>> wrote:
>>>
>>>> Hello all,
>>>>
>>>> I'm using spark 1.6 and trying to cache a dataset which is 1.5 TB, I
>>>> have only ~800GB RAM  in total, so I am choosing the DISK_ONLY storage
>>>> level. Unfortunately, I'm getting out of the overhead memory limit:
>>>>
>>>>
>>>> Container killed by YARN for exceeding memory limits. 27.0 GB of 27 GB physical
memory used. Consider boosting spark.yarn.executor.memoryOverhead.
>>>>
>>>>
>>>> I'm giving 6GB overhead memory and using 10 cores per executor.
>>>> Apparently, that's not enough. Without persisting the data and later
>>>> computing the dataset (twice in my case) the job works fine. Can anyone,
>>>> please, explain what is the overhead which consumes that much memory during
>>>> persist to the disk and how can I estimate what extra memory should I give
>>>> to the executors in order to make it not fail?
>>>>
>>>> Thanks, Alex.
>>>>
>>>
>>>
>>
>

Mime
View raw message