spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Akshat Aranya <aara...@gmail.com>
Subject Re: Larger heap leads to perf degradation due to GC
Date Thu, 16 Oct 2014 14:47:15 GMT
I just want to pitch in and say that I ran into the same problem with
running with 64GB executors.  For example, some of the tasks take 5 minutes
to execute, out of which 4 minutes are spent in GC.  I'll try out smaller
executors.

On Mon, Oct 6, 2014 at 6:35 PM, Otis Gospodnetic <otis.gospodnetic@gmail.com
> wrote:

> Hi,
>
> The other option to consider is using G1 GC, which should behave better
> with large heaps.  But pointers are not compressed in heaps > 32 GB in
> size, so you may be better off staying under 32 GB.
>
> Otis
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
>
>
> On Mon, Oct 6, 2014 at 8:08 PM, Mingyu Kim <mkim@palantir.com> wrote:
>
>> Ok, cool. This seems to be general issues in JVM with very large heaps. I
>> agree that the best workaround would be to keep the heap size below 32GB.
>> Thanks guys!
>>
>> Mingyu
>>
>> From: Arun Ahuja <aahuja11@gmail.com>
>> Date: Monday, October 6, 2014 at 7:50 AM
>> To: Andrew Ash <andrew@andrewash.com>
>> Cc: Mingyu Kim <mkim@palantir.com>, "user@spark.apache.org" <
>> user@spark.apache.org>, Dennis Lawler <dlawler@palantir.com>
>> Subject: Re: Larger heap leads to perf degradation due to GC
>>
>> We have used the strategy that you suggested, Andrew - using many workers
>> per machine and keeping the heaps small (< 20gb).
>>
>> Using a large heap resulted in workers hanging or not responding (leading
>> to timeouts).  The same dataset/job for us will fail (most often due to
>> akka disassociated or fetch failures errors) with 10 cores / 100 executors,
>> 60 gb per executor while succceed with 1 core / 1000 executors / 6gb per
>> executor.
>>
>> When the job does succceed with more cores per executor and larger heap
>> it is usually much slower than the smaller executors (the same 8-10 min job
>> taking 15-20 min to complete)
>>
>> The unfortunate downside of this has been, we have had some large
>> broadcast variables which may not fit into memory (and unnecessarily
>> duplicated) when using the smaller executors.
>>
>> Most of this is anecdotal but for the most part we have had more success
>> and consistency with more executors with smaller memory requirements.
>>
>> On Sun, Oct 5, 2014 at 7:20 PM, Andrew Ash <andrew@andrewash.com> wrote:
>>
>>> Hi Mingyu,
>>>
>>> Maybe we should be limiting our heaps to 32GB max and running multiple
>>> workers per machine to avoid large GC issues.
>>>
>>> For a 128GB memory, 32 core machine, this could look like:
>>>
>>> SPARK_WORKER_INSTANCES=4
>>> SPARK_WORKER_MEMORY=32
>>> SPARK_WORKER_CORES=8
>>>
>>> Are people running with large (32GB+) executor heaps in production?  I'd
>>> be curious to hear if so.
>>>
>>> Cheers!
>>> Andrew
>>>
>>> On Thu, Oct 2, 2014 at 1:30 PM, Mingyu Kim <mkim@palantir.com> wrote:
>>>
>>>> This issue definitely needs more investigation, but I just wanted to
>>>> quickly check if anyone has run into this problem or has general guidance
>>>> around it. We’ve seen a performance degradation with a large heap on a
>>>> simple map task (I.e. No shuffle). We’ve seen the slowness starting around
>>>> from 50GB heap. (I.e. spark.executor.memoty=50g) And, when we checked the
>>>> CPU usage, there were just a lot of GCs going on.
>>>>
>>>> Has anyone seen a similar problem?
>>>>
>>>> Thanks,
>>>> Mingyu
>>>>
>>>
>>>
>>
>

Mime
View raw message