spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dongjoon Hyun (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-27734) Add memory based thresholds for shuffle spill
Date Tue, 16 Jul 2019 16:42:16 GMT

     [ https://issues.apache.org/jira/browse/SPARK-27734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Dongjoon Hyun updated SPARK-27734:
----------------------------------
    Affects Version/s:     (was: 2.4.3)
                       3.0.0

> Add memory based thresholds for shuffle spill
> ---------------------------------------------
>
>                 Key: SPARK-27734
>                 URL: https://issues.apache.org/jira/browse/SPARK-27734
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core, SQL
>    Affects Versions: 3.0.0
>            Reporter: Adrian Muraru
>            Priority: Minor
>
> When running large shuffles (700TB input data, 200k map tasks, 50k reducers on a 300
nodes cluster) the job is regularly OOMing in map and reduce phase.
> IIUC ShuffleExternalSorter (map side) and ExternalAppendOnlyMap and ExternalSorter (reduce
side) are trying to max out the available execution memory. This in turn doesn't play nice
with the Garbage Collector and executors are failing with OutOfMemoryError when the memory
allocation from these in-memory structure is maxing out the available heap size (in our case
we are running with 9 cores/executor, 32G per executor)
> To mitigate this, I set {{spark.shuffle.spill.numElementsForceSpillThreshold}} to force
the spill on disk. While this config works, it is not flexible enough as it's expressed in
number of elements, and in our case we run multiple shuffles in a single job and element size
is different from one stage to another.
> We have an internal patch to extend this behaviour and add two new parameters to control
the spill based on memory usage:
> - spark.shuffle.spill.map.maxRecordsSizeForSpillThreshold
> - spark.shuffle.spill.reduce.maxRecordsSizeForSpillThreshold
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message