hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John Thomason (JIRA)" <j...@apache.org>
Subject [jira] [Created] (MAPREDUCE-7160) Job end notification not sent and client service not stopped after YarnRuntimeException in shutDownJob in MRAppMaster.java
Date Wed, 21 Nov 2018 00:15:00 GMT
John Thomason created MAPREDUCE-7160:
----------------------------------------

             Summary: Job end notification not sent and client service not stopped after YarnRuntimeException
in shutDownJob in MRAppMaster.java
                 Key: MAPREDUCE-7160
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7160
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: applicationmaster
    Affects Versions: 3.0.3
            Reporter: John Thomason


If a YarnRuntimeException occurs during shutDownJob (in this case, at the line 'MRAppMaster.this.stop();')
in MRAppMaster.java, 
{code:java}
2018-11-20 12:38:46,173 INFO [Thread-171] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler:
In stop, writing event TASK_FINISHED 
2018-11-20 12:38:46,173 ERROR [Thread-171] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler:
Error writing History Event: org.apache.hadoop.mapreduce.jobhistory.TaskFinishedEvent@3b1a7bb5
java.io.IOException: All datanodes [DatanodeInfoWithStorage[10.11.1.227:9866,DS-3539e7b8-5d87-45e5-880a-1897f11577d2,DISK]]
are bad. Aborting... 
at org.apache.hadoop.hdfs.DataStreamer.handleBadDatanode(DataStreamer.java:1561) at org.apache.hadoop.hdfs.DataStreamer.setupPipelineInternal(DataStreamer.java:1495)
at org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1481)
at org.apache.hadoop.hdfs.DataStreamer.processDatanodeOrExternalError(DataStreamer.java:1256)
at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:667) 2018-11-20 12:38:46,174
INFO [Thread-171] org.apache.hadoop.service.AbstractService: Service JobHistoryEventHandler
failed in state STOPPED

{code}
then the line 
{code:java}
catch (Throwable t) { 
LOG.warn("Graceful stop failed. Exiting.. ", t); 
exitMRAppMaster(1, t); 
}

{code}
causes the MRAppMaster to exit before the finally block can execute. This means that the
job end notification is not sent despite [https://jira.apache.org/jira/browse/MAPREDUCE-6895?attachmentOrder=desc]. 

Additionally, this finally block should also call 'clientService.stop();', as otherwise the
following errors occur on the client side:
{code:java}
2018-11-07 19:50:59,602 INFO mapreduce.Job: map 100% reduce 98% 
2018-11-07 19:51:03,617 INFO mapreduce.Job: map 100% reduce 99% 
2018-11-07 19:51:07,648 INFO mapreduce.Job: map 100% reduce 100% 
2018-11-07 19:51:07,656 INFO mapreduce.Job: Job job_1541647829228_0001 completed successfully

2018-11-07 19:51:08,162 INFO mapred.ClientServiceDelegate: Application state is completed.
FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 
2018-11-07 19:51:09,169 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:10020.
Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS) 2018-11-07 19:51:10,170 INFO ipc.Client: Retrying connect to
server: 0.0.0.0/0.0.0.0:10020. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS) 2018-11-07 19:51:11,171 INFO ipc.Client: Retrying connect to
server: 0.0.0.0/0.0.0.0:10020. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
......
2018-11-07 19:51:38,407 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:10020.
Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS) java.io.IOException: java.net.ConnectException: Your endpoint
configuration is wrong; For more details see: http://wiki.apache.org/hadoop/UnsetHostnameOrPort
at org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:344) at
org.apache.hadoop.mapred.ClientServiceDelegate.getJobCounters(ClientServiceDelegate.java:382)
at org.apache.hadoop.mapred.YARNRunner.getJobCounters(YARNRunner.java:859) at org.apache.hadoop.mapreduce.Job$8.run(Job.java:820)
at org.apache.hadoop.mapreduce.Job$8.run(Job.java:817) at java.security.AccessController.doPrivileged(Native
Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1686)
at org.apache.hadoop.mapreduce.Job.getCounters(Job.java:817) at org.apache.hadoop.mapreduce.Job.monitorAndPrintJob(Job.java:1663)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1591) at org.apache.hadoop.examples.terasort.TeraSort.run(TeraSort.java:334)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.hadoop.examples.terasort.TeraSort.main(TeraSort.java:343)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at
java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144) at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at
java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:244)
at org.apache.hadoop.util.RunJar.main(RunJar.java:158)
{code}
 

Adding calls to send the job end notification and stop the client service prior to ' exitMRAppMaster'
within the catch block fixes this issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-help@hadoop.apache.org


Mime
View raw message