spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Owen (Jira)" <>
Subject [jira] [Updated] (SPARK-28843) Set OMP_NUM_THREADS to executor cores reduce Python memory consumption
Date Thu, 29 Aug 2019 01:51:00 GMT


Sean Owen updated SPARK-28843:
    Docs Text: Pyspark workers now set the env variable OMP_NUM_THREADS (if not already set)
to the number of cores used by an executor (spark.executor.cores). When unset, it defaulted
to the total number of VM cores. This avoids excessively large OpenMP thread pools when using,
for example, numpy.

> Set OMP_NUM_THREADS to executor cores reduce Python memory consumption
> ----------------------------------------------------------------------
>                 Key: SPARK-28843
>                 URL:
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark
>    Affects Versions: 2.3.3, 3.0.0, 2.4.3
>            Reporter: Ryan Blue
>            Priority: Major
>              Labels: release-notes
> While testing hardware with more cores, we found that the amount of memory required by
PySpark applications increased and tracked the problem to importing numpy. The numpy issue
isĀ []
> NumPy uses OpenMP that starts a thread pool with the number of cores on the machine (and
does not respect cgroups). When we set this lower we see a significant reduction in memory
> This parallelism setting should be set to the number of cores allocated to the executor,
not the number of cores available.

This message was sent by Atlassian Jira

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message