spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ryan Blue (Jira)" <j...@apache.org>
Subject [jira] [Created] (SPARK-28843) Set OMP_NUM_THREADS to executor cores reduce Python memory consumption
Date Wed, 21 Aug 2019 22:55:00 GMT
Ryan Blue created SPARK-28843:
---------------------------------

             Summary: Set OMP_NUM_THREADS to executor cores reduce Python memory consumption
                 Key: SPARK-28843
                 URL: https://issues.apache.org/jira/browse/SPARK-28843
             Project: Spark
          Issue Type: Improvement
          Components: PySpark
    Affects Versions: 2.4.3, 2.3.3, 3.0.0
            Reporter: Ryan Blue


While testing hardware with more cores, we found that the amount of memory required by PySpark
applications increased and tracked the problem to importing numpy. The numpy issue isĀ [https://github.com/numpy/numpy/issues/10455]

NumPy uses OpenMP that starts a thread pool with the number of cores on the machine (and does
not respect cgroups). When we set this lower we see a reduction in memory consumption.

This parallelism setting should be set to the number of cores allocated to the executor, not
the number of cores available.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message