spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "praveentallapudi (Jira)" <>
Subject [jira] [Commented] (SPARK-22876) does not work correctly
Date Thu, 22 Aug 2019 00:05:00 GMT


praveentallapudi commented on SPARK-22876:

Hi Nikita, Thanks for bringing the issue. We have the same issue. What alternative route you
took for this? Is there any other approach to restart spark jobs.

> does not work correctly
> ---------------------------------------------------------------------
>                 Key: SPARK-22876
>                 URL:
>             Project: Spark
>          Issue Type: Bug
>          Components: YARN
>    Affects Versions: 2.2.0
>         Environment: hadoop version 2.7.3
>            Reporter: Jinhan Zhong
>            Priority: Minor
>              Labels: bulk-closed
> I assume we can use spark.yarn.maxAppAttempts together with
to make a long running application avoid stopping  after acceptable number of failures.
> But after testing, I found that the application always stops after failing n times (
n is minimum value of spark.yarn.maxAppAttempts and from
client yarn-site.xml)
> for example, following setup will allow the application master to fail 20 times.
> *
> * spark.yarn.maxAppAttempts=20
> * yarn client:
> * yarn resource manager:
> And after checking the source code, I found in source file ApplicationMaster.scala
> there's a ShutdownHook that checks the attempt id against the maxAppAttempts, if attempt
id >= maxAppAttempts, it will try to unregister the application and the application will
> is this a expected design or a bug?

This message was sent by Atlassian Jira

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message