spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hyukjin Kwon (Jira)" <j...@apache.org>
Subject [jira] [Resolved] (SPARK-30379) Avoid OOM when using collection accumulator
Date Tue, 31 Dec 2019 03:46:00 GMT

     [ https://issues.apache.org/jira/browse/SPARK-30379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Hyukjin Kwon resolved SPARK-30379.
----------------------------------
    Fix Version/s: 3.0.0
       Resolution: Fixed

Issue resolved by pull request 27038
[https://github.com/apache/spark/pull/27038]

> Avoid OOM when using collection accumulator
> -------------------------------------------
>
>                 Key: SPARK-30379
>                 URL: https://issues.apache.org/jira/browse/SPARK-30379
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 3.0.0
>            Reporter: L. C. Hsieh
>            Assignee: L. C. Hsieh
>            Priority: Major
>             Fix For: 3.0.0
>
>
> One Spark job on our cluster uses collection accumulator to collect something and has
encountered an exception like:
> ```
> java.lang.OutOfMemoryError: Java heap space
>     at java.util.Arrays.copyOf(Arrays.java:3332)
>     at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
>     at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448)
>     at java.lang.StringBuilder.append(StringBuilder.java:136)
>     at java.lang.StringBuilder.append(StringBuilder.java:131)
>     at java.util.AbstractCollection.toString(AbstractCollection.java:462)
>     at java.util.Collections$UnmodifiableCollection.toString(Collections.java:1035)
>     at org.apache.spark.status.LiveEntityHelpers$$anonfun$newAccumulatorInfos$2$$anonfun$apply$3.apply(LiveEntity.scala:596)
>     at org.apache.spark.status.LiveEntityHelpers$$anonfun$newAccumulatorInfos$2$$anonfun$apply$3.apply(LiveEntity.scala:596)
>     at scala.Option.map(Option.scala:146)
>     at org.apache.spark.status.LiveEntityHelpers$$anonfun$newAccumulatorInfos$2.apply(LiveEntity.scala:596)
>     at org.apache.spark.status.LiveEntityHelpers$$anonfun$newAccumulatorInfos$2.apply(LiveEntity.scala:591)
> ```
> `LiveEntityHelpers.newAccumulatorInfos` converts `AccumulableInfo`s to `v1.AccumulableInfo`
by calling `toString` on accumulator's value. For collection accumulator, it might take much
more memory when in string representation, for example, collection accumulator of long values,
and cause OOM (in this job, the driver memory is 6g).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message