spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Terry Kim (Jira)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-31625) Unregister application from YARN resource manager outside the shutdown hook
Date Sat, 02 May 2020 03:02:00 GMT

     [ https://issues.apache.org/jira/browse/SPARK-31625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Terry Kim updated SPARK-31625:
------------------------------
    Description: 
Currently, an application is unregistered from YARN resource manager as a shutdown hook. In
the scenario where the shutdown hook does not run (e.g., timeouts, etc.), the application
is not unregistered, resulting in YARN resubmitting the application even if it succeeded.

For example, you could see the following on the driver log:
{code:java}
20/04/30 06:20:29 INFO SparkContext: Successfully stopped SparkContext
20/04/30 06:20:29 INFO ApplicationMaster: Final app status: SUCCEEDED, exitCode: 0
20/04/30 06:20:59 WARN ShutdownHookManager: ShutdownHook '$anon$2' timeout, java.util.concurrent.TimeoutException
java.util.concurrent.TimeoutException
	at java.util.concurrent.FutureTask.get(FutureTask.java:205)
	at org.apache.hadoop.util.ShutdownHookManager.executeShutdown(ShutdownHookManager.java:124)
	at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:95)
{code}
On the YARN RM side:
{code:java}
2020-04-30 06:21:25,083 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
container_1588227360159_0001_01_000001 Container Transitioned from RUNNING to COMPLETED
2020-04-30 06:21:25,085 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
Updating application attempt appattempt_1588227360159_0001_000001 with final state: FAILED,
and exit status: 0
2020-04-30 06:21:25,085 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
appattempt_1588227360159_0001_000001 State change from RUNNING to FINAL_SAVING on event =
CONTAINER_FINISHED
{code}
You see that the final state of the application becomes FAILED since the container is finished
before the application is unregistered.

  was:
Currently, an application is unregistered from YARN resource manager as a shutdown hook. In
the scenario where the shutdown hook does not run (e.g., timeouts, etc.), the application
is not unregistered, resulting in YARN resubmitting the application even if it succeeded.

For example, you could see the following on the driver log:
{code:java}
20/04/30 06:20:29 INFO SparkContext: Successfully stopped SparkContext
20/04/30 06:20:29 INFO ApplicationMaster: Final app status: SUCCEEDED, exitCode: 0
20/04/30 06:20:59 WARN ShutdownHookManager: ShutdownHook '$anon$2' timeout, java.util.concurrent.TimeoutException
java.util.concurrent.TimeoutException
	at java.util.concurrent.FutureTask.get(FutureTask.java:205)
	at org.apache.hadoop.util.ShutdownHookManager.executeShutdown(ShutdownHookManager.java:124)
	at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:95)
{code}
On the YARN RM side:
{code:java}
2020-04-30 06:21:25,083 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
container_1588227360159_0001_01_000001 Container Transitioned from RUNNING to COMPLETED
2020-04-30 06:21:25,085 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
Updating application attempt appattempt_1588227360159_0001_000001 with final state: FAILED,
and exit status: 0
2020-04-30 06:21:25,085 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
appattempt_1588227360159_0001_000001 State change from RUNNING to FINAL_SAVING on event =
CONTAINER_FINISHED
{code}
You see the final state of the application becomes FAILED since container is finished before
the application is unregistered.


> Unregister application from YARN resource manager outside the shutdown hook
> ---------------------------------------------------------------------------
>
>                 Key: SPARK-31625
>                 URL: https://issues.apache.org/jira/browse/SPARK-31625
>             Project: Spark
>          Issue Type: Improvement
>          Components: YARN
>    Affects Versions: 3.1.0
>            Reporter: Terry Kim
>            Priority: Major
>
> Currently, an application is unregistered from YARN resource manager as a shutdown hook.
In the scenario where the shutdown hook does not run (e.g., timeouts, etc.), the application
is not unregistered, resulting in YARN resubmitting the application even if it succeeded.
> For example, you could see the following on the driver log:
> {code:java}
> 20/04/30 06:20:29 INFO SparkContext: Successfully stopped SparkContext
> 20/04/30 06:20:29 INFO ApplicationMaster: Final app status: SUCCEEDED, exitCode: 0
> 20/04/30 06:20:59 WARN ShutdownHookManager: ShutdownHook '$anon$2' timeout, java.util.concurrent.TimeoutException
> java.util.concurrent.TimeoutException
> 	at java.util.concurrent.FutureTask.get(FutureTask.java:205)
> 	at org.apache.hadoop.util.ShutdownHookManager.executeShutdown(ShutdownHookManager.java:124)
> 	at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:95)
> {code}
> On the YARN RM side:
> {code:java}
> 2020-04-30 06:21:25,083 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
container_1588227360159_0001_01_000001 Container Transitioned from RUNNING to COMPLETED
> 2020-04-30 06:21:25,085 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
Updating application attempt appattempt_1588227360159_0001_000001 with final state: FAILED,
and exit status: 0
> 2020-04-30 06:21:25,085 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
appattempt_1588227360159_0001_000001 State change from RUNNING to FINAL_SAVING on event =
CONTAINER_FINISHED
> {code}
> You see that the final state of the application becomes FAILED since the container is
finished before the application is unregistered.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message