tez-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Eagles (JIRA)" <j...@apache.org>
Subject [jira] [Created] (TEZ-3950) Preempted task attempts intermittently marked as FAILED instead of KILLED
Date Tue, 05 Jun 2018 17:04:00 GMT
Jonathan Eagles created TEZ-3950:
------------------------------------

             Summary: Preempted task attempts intermittently marked as FAILED instead of KILLED
                 Key: TEZ-3950
                 URL: https://issues.apache.org/jira/browse/TEZ-3950
             Project: Apache Tez
          Issue Type: Bug
    Affects Versions: 0.9.2, 0.10.0
            Reporter: Jonathan Eagles
         Attachments: TEZ-3950.fail.patch

TestMockDAGAppMaster.testInternalPreemption intermittently fails with expected:<KILLED>
but was:<FAILED>


Crux of the matter is TaskSchedulerManager sends two events

- TaskScheduler#deallocatedContainer->TaskSchedulerManager#containerBeingReleased->Sends
AMContainerStopRequest -> TA_CONTAINER_TERMINATING
- AMContainerEventCompleted -> TA_CONTAINER_TERMINATED_BY_SYSTEM

In order to kill a task attempt correctly the second message loop must complete first. The
first path is longer so the second message loop completes almost always first. When the first
message loop completes first, then the task attempt is marked as FAILED and not KILLED.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message