spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Apache Spark (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-25004) Add spark.executor.pyspark.memory config to set resource.RLIMIT_AS
Date Thu, 02 Aug 2018 19:34:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-25004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16567396#comment-16567396
] 

Apache Spark commented on SPARK-25004:
--------------------------------------

User 'rdblue' has created a pull request for this issue:
https://github.com/apache/spark/pull/21977

> Add spark.executor.pyspark.memory config to set resource.RLIMIT_AS
> ------------------------------------------------------------------
>
>                 Key: SPARK-25004
>                 URL: https://issues.apache.org/jira/browse/SPARK-25004
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 2.3.0
>            Reporter: Ryan Blue
>            Priority: Major
>
> Some platforms support limiting Python's addressable memory space by limiting [{{resource.RLIMIT_AS}}|https://docs.python.org/3/library/resource.html#resource.RLIMIT_AS].
> We've found that adding a limit is very useful when running in YARN because when Python
doesn't know about memory constraints, it doesn't know when to garbage collect and will continue
using memory when it doesn't need to. Adding a limit reduces PySpark memory consumption and
avoids YARN killing containers because Python hasn't cleaned up memory.
> This also improves error messages for users, allowing them to see when Python is allocating
too much memory instead of YARN killing the container:
> {code:lang=python}
>   File "build/bdist.linux-x86_64/egg/package/library.py", line 265, in fe_engineer
>     fe_eval_rec.update(f(src_rec_prep, mat_rec_prep))
>   File "build/bdist.linux-x86_64/egg/package/library.py", line 163, in fe_comp
>     comparisons = EvaluationUtils.leven_list_compare(src_rec_prep.get(item, []), mat_rec_prep.get(item,
[]))
>   File "build/bdist.linux-x86_64/egg/package/evaluationutils.py", line 25, in leven_list_compare
>     permutations = sorted(permutations, reverse=True)
>   MemoryError
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message