spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Saisai Shao (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-12447) Only update AM's internal state when executor is successfully launched by NM
Date Fri, 25 Dec 2015 07:55:49 GMT

     [ https://issues.apache.org/jira/browse/SPARK-12447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Saisai Shao updated SPARK-12447:
--------------------------------
    Description: 
Currently {{YarnAllocator}} will update its managed states like {{numExecutorsRunning}} after
container is allocated but before executor are successfully launched. 

This happened when Spark configuration is wrong (like spark_shuffle aux-service is not configured
in NM occasionally), which makes executor fail to launch, or NM lost when NMClient is communicated.

In the current implementation, state will also be updated even executor is failed to launch,
this will lead to incorrect state of AM. Also lingering container will only be release after
timeout, this will introduce resource waste.

So here we should update the states only after executor is correctly launched, otherwise we
should release container ASAP to make it fail fast and retry.

  was:
Currently {{YarnAllocator}} will update its managed states like {{numExecutorsRunning}} after
container is allocated but before executor are successfully launched. 

This happened when Spark configuration is wrong, which makes executor fail to launch, or NM
lost when NMClient is communicated.

In the current implementation, state will also be updated even executor is failed to launch,
this will lead to incorrect state of AM. Also lingering container will only be release after
timeout, this will introduce resource waste.

So here we should update the states only after executor is correctly launched, otherwise we
should release container ASAP to make it fail fast and retry.


> Only update AM's internal state when executor is successfully launched by NM
> ----------------------------------------------------------------------------
>
>                 Key: SPARK-12447
>                 URL: https://issues.apache.org/jira/browse/SPARK-12447
>             Project: Spark
>          Issue Type: Bug
>          Components: YARN
>    Affects Versions: 1.6.0
>            Reporter: Saisai Shao
>            Assignee: Apache Spark
>
> Currently {{YarnAllocator}} will update its managed states like {{numExecutorsRunning}}
after container is allocated but before executor are successfully launched. 
> This happened when Spark configuration is wrong (like spark_shuffle aux-service is not
configured in NM occasionally), which makes executor fail to launch, or NM lost when NMClient
is communicated.
> In the current implementation, state will also be updated even executor is failed to
launch, this will lead to incorrect state of AM. Also lingering container will only be release
after timeout, this will introduce resource waste.
> So here we should update the states only after executor is correctly launched, otherwise
we should release container ASAP to make it fail fast and retry.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message