hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Haibo Chen (JIRA)" <j...@apache.org>
Subject [jira] [Created] (MAPREDUCE-6675) TestJobImpl.testUnusableNode failed
Date Thu, 14 Apr 2016 23:31:25 GMT
Haibo Chen created MAPREDUCE-6675:
-------------------------------------

             Summary: TestJobImpl.testUnusableNode failed 
                 Key: MAPREDUCE-6675
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6675
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: mrv2
    Affects Versions: 2.7.3
            Reporter: Haibo Chen
            Assignee: Haibo Chen


TestJobImpl#testUnusableNodeTransition is flaky.

2016-02-13 09:16:42 Running org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl
2016-02-13 09:16:50 Tests run: 17, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 8.324
sec <<< FAILURE! - in org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl
2016-02-13 09:16:50 testUnusableNodeTransition(org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl)
 Time elapsed: 5.165 sec  <<< FAILURE!
2016-02-13 09:16:50 java.lang.AssertionError: expected:<SUCCEEDED> but was:<ERROR>
2016-02-13 09:16:50 	at org.junit.Assert.fail(Assert.java:88)
2016-02-13 09:16:50 	at org.junit.Assert.failNotEquals(Assert.java:743)
2016-02-13 09:16:50 	at org.junit.Assert.assertEquals(Assert.java:118)
2016-02-13 09:16:50 	at org.junit.Assert.assertEquals(Assert.java:144)
2016-02-13 09:16:50 	at org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl.assertJobState(TestJobImpl.java:977)
2016-02-13 09:16:50 	at org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl.testUnusableNodeTransition(TestJobImpl.java:627)
2016-02-13 09:16:50 
2016-02-13 09:16:50 
2016-02-13 09:16:50 Results :
2016-02-13 09:16:50 
2016-02-13 09:16:50 Failed tests: 
2016-02-13 09:16:50   TestJobImpl.testUnusableNodeTransition:627->assertJobState:977 expected:<SUCCEEDED>
but was:<ERROR>
2016-02-13 09:16:50 
2016-02-13 09:16:50 Tests run: 17, Failures: 1, Errors: 0, Skipped: 0.


Looking at the code, an JobUpdatedNodesEvent is handled by putting an TaskAttemptKill event
on the async dispatcher queue and return immediately, but the event might not have been processed
by the time  all JobTaskEvents events are seen by the job (the jobTaskSucceeded events are
handed to Job immediately without going through the dispatcher). Therefore, there is a slight
chance that the job will see all three succeeded attempts and  transition to Committing state
before the taskAttemptKill event is handled by the dispatcher. Committing jobs will reject
later JobTaskEvents received and causing the failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message