spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Apache Spark (Jira)" <j...@apache.org>
Subject [jira] [Assigned] (SPARK-31418) Blacklisting feature aborts Spark job without retrying for max num retries in case of Dynamic allocation
Date Fri, 01 May 2020 16:52:02 GMT

     [ https://issues.apache.org/jira/browse/SPARK-31418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Apache Spark reassigned SPARK-31418:
------------------------------------

    Assignee:     (was: Apache Spark)

> Blacklisting feature aborts Spark job without retrying for max num retries in case of
Dynamic allocation
> --------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-31418
>                 URL: https://issues.apache.org/jira/browse/SPARK-31418
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 2.3.0, 2.4.5
>            Reporter: Venkata krishnan Sowrirajan
>            Priority: Major
>
> With Spark blacklisting, if a task fails on an executor, the executor gets blacklisted
for the task. In order to retry the task, it checks if there are idle blacklisted executor
which can be killed and replaced to retry the task if not it aborts the job without doing
max retries.
> In the context of dynamic allocation this can be better, instead of killing the blacklisted
idle executor (its possible there are no idle blacklisted executor), request an additional
executor and retry the task.
> This can be easily reproduced with a simple job like below, although this example should
fail eventually just to show that its not retried spark.task.maxFailures times: 
> {code:java}
> def test(a: Int) = { a.asInstanceOf[String] }
> sc.parallelize(1 to 10, 10).map(x => test(x)).collect 
> {code}
> with dynamic allocation enabled and min executors set to 1. But there are various other
cases where this can fail as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message