tez-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oleg Zhurakousky <ozhurakou...@hortonworks.com>
Subject Re: Deadlock in DAGAppMaster during shutdown.
Date Tue, 10 Jun 2014 11:25:06 GMT
Subroto

Thanks for pointing this out. 
This and the TezClient issue you’ve pointed out in your previous email is actually being
actively addressed

Oleg

On Jun 10, 2014, at 5:42 AM, Subroto Sanyal <sanyalsubroto@gmail.com> wrote:

> In the class AMRMClientAsyncImpl the object(7c3041e28) is being locked by
> Heartbeat thread(which kinds of run a infinite loop as any heartbeat
> thread) which is requested to be locked by the method
> unregisterApplicationMaster.
> 
> Once the method unregisterApplicationMaster can lock the requested object;
> then only it can notify the heartbeat thread to exit by a boolean flag
> keepRunning.
> 
> Following is the thread-dump for the deadlock:
> 
> "AMShutdownThread" daemon prio=5 tid=7f9a02921800 nid=0x115d68000 waiting
> for monitor entry [115d67000]
> 
>   java.lang.Thread.State: BLOCKED (on object monitor)
> 
> at
> org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl.unregisterApplicationMaster(AMRMClientAsyncImpl.java:156)
> 
> - waiting to lock <7c3041e28> (a java.lang.Object)
> 
> at
> org.apache.tez.dag.app.rm.TaskScheduler.serviceStop(TaskScheduler.java:394)
> 
> - locked <7c3006aa0> (a org.apache.tez.dag.app.rm.TaskScheduler)
> 
> at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
> 
> - locked <7c3038008> (a java.lang.Object)
> 
> at
> org.apache.tez.dag.app.rm.TaskSchedulerEventHandler.serviceStop(TaskSchedulerEventHandler.java:357)
> 
> at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
> 
> - locked <7c2f71360> (a java.lang.Object)
> 
> at
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
> 
> at
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
> 
> at org.apache.tez.dag.app.DAGAppMaster.stopServices(DAGAppMaster.java:1518)
> 
> at org.apache.tez.dag.app.DAGAppMaster.serviceStop(DAGAppMaster.java:1649)
> 
> - locked <7c2f51790> (a org.apache.tez.dag.app.DAGAppMaster)
> 
> at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
> 
> - locked <7c2fed728> (a java.lang.Object)
> 
> at
> org.apache.tez.dag.app.DAGAppMaster$DAGAppMasterShutdownHandler$AMShutdownRunnable.run(DAGAppMaster.java:607)
> 
> at java.lang.Thread.run(Thread.java:695)
> 
> 
> "AMRM Heartbeater thread" prio=5 tid=7f9a0c0e8800 nid=0x111e70000 waiting
> on condition [111e6f000]
> 
>   java.lang.Thread.State: TIMED_WAITING (sleeping)
> 
> at java.lang.Thread.sleep(Native Method)
> 
> at
> org.apache.hadoop.util.ThreadUtil.sleepAtLeastIgnoreInterrupts(ThreadUtil.java:43)
> 
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:150)
> 
> at com.sun.proxy.$Proxy9.allocate(Unknown Source)
> 
> at
> org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.allocate(AMRMClientImpl.java:246)
> 
> at
> org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$HeartbeatThread.run(AMRMClientAsyncImpl.java:224)
> 
> - locked <7c3041e28> (a java.lang.Object)
> 
> *public void unregisterApplicationMaster(FinalApplicationStatus appStatus,*
> 
> *      String appMessage, String appTrackingUrl) throws YarnException,*
> 
> *      IOException {*
> 
> *    synchronized (unregisterHeartbeatLock) {*
> 
> *      keepRunning = false;*
> 
> *      client.unregisterApplicationMaster(appStatus, appMessage,
> appTrackingUrl);*
> 
> *    }*
> 
> *  }*
> 
> 
> The line "keepRunning = false" should be outside the synchronized block.
> 
> I am not sure this should be regarded as problem in yarn or TEZ. The flag
> is private and can't be accessed by Tez implementation TezAMRMClientAsync.
> 
> 
> -- 
> Cheers,
> *Subroto Sanyal*


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Mime
View raw message