hadoop-yarn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Omkar Vinit Joshi (JIRA)" <j...@apache.org>
Subject [jira] [Created] (YARN-1421) Node managers will not receive application finish event where containers ran before RM restart
Date Mon, 18 Nov 2013 20:25:28 GMT
Omkar Vinit Joshi created YARN-1421:
---------------------------------------

             Summary: Node managers will not receive application finish event where containers
ran before RM restart
                 Key: YARN-1421
                 URL: https://issues.apache.org/jira/browse/YARN-1421
             Project: Hadoop YARN
          Issue Type: Bug
            Reporter: Omkar Vinit Joshi
            Assignee: Omkar Vinit Joshi
            Priority: Critical


Problem :- Today for every application we track the node managers where container ran. So
when application finishes it notifies all those node managers about application finish event
(via node manager heartbeat). However if rm restarts then we forget this past information
and those node managers will never get application finish event and will keep reporting finished
applications.

Propose Solution :- Instead of remembering the node managers where containers ran for this
particular application it would be better if we depend on node manager heartbeat to take this
decision. i.e. when node manager heartbeats saying it is running application (app1, app2)
then we should those application's status in RM's memory {code}rmContext.getRMApps(){code}
and if either they are not found (very old applications) or they are in their final state
(FINISHED, KILLED, FAILED) then we should immediately notify the node manager about the application
finish event.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message