spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dongjoon Hyun (Jira)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-32795) ApplicationInfo#removedExecutors can cause OOM
Date Fri, 11 Sep 2020 15:42:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-32795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17194331#comment-17194331
] 

Dongjoon Hyun commented on SPARK-32795:
---------------------------------------

Hi, [~victor.tso]. Could you provide a reproducible example like your case?

> ApplicationInfo#removedExecutors can cause OOM
> ----------------------------------------------
>
>                 Key: SPARK-32795
>                 URL: https://issues.apache.org/jira/browse/SPARK-32795
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.4.0
>            Reporter: Victor Tso
>            Priority: Major
>         Attachments: image-2020-09-03-23-27-11-809.png
>
>
> !image-2020-09-03-23-27-11-809.png|width=840,height=439!
> In my case, the Standalone Spark master process had a max heap of 1g. 738mb were consumed
by these ExecutorDesc objects, the vast majority of which were the 18.5M removedExecutors.
This caused the master to OOM and leave the application driver process dangling.
> The reason for this is that the worker node ran out of disk space, so for whatever reason
decided to go in a fast and endless loop trying to launch new executors and they in turn crashed
too. It got up to the 18M before the master just couldn't handle the history anymore.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message