tez-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Subroto Sanyal <sanyalsubr...@gmail.com>
Subject Deadlock in DAGAppMaster during shutdown.
Date Tue, 10 Jun 2014 09:42:57 GMT
In the class AMRMClientAsyncImpl the object(7c3041e28) is being locked by
Heartbeat thread(which kinds of run a infinite loop as any heartbeat
thread) which is requested to be locked by the method
unregisterApplicationMaster.

Once the method unregisterApplicationMaster can lock the requested object;
then only it can notify the heartbeat thread to exit by a boolean flag
keepRunning.

Following is the thread-dump for the deadlock:

"AMShutdownThread" daemon prio=5 tid=7f9a02921800 nid=0x115d68000 waiting
for monitor entry [115d67000]

   java.lang.Thread.State: BLOCKED (on object monitor)

at
org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl.unregisterApplicationMaster(AMRMClientAsyncImpl.java:156)

- waiting to lock <7c3041e28> (a java.lang.Object)

at
org.apache.tez.dag.app.rm.TaskScheduler.serviceStop(TaskScheduler.java:394)

- locked <7c3006aa0> (a org.apache.tez.dag.app.rm.TaskScheduler)

at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)

- locked <7c3038008> (a java.lang.Object)

at
org.apache.tez.dag.app.rm.TaskSchedulerEventHandler.serviceStop(TaskSchedulerEventHandler.java:357)

at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)

- locked <7c2f71360> (a java.lang.Object)

at
org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)

at
org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)

at org.apache.tez.dag.app.DAGAppMaster.stopServices(DAGAppMaster.java:1518)

at org.apache.tez.dag.app.DAGAppMaster.serviceStop(DAGAppMaster.java:1649)

- locked <7c2f51790> (a org.apache.tez.dag.app.DAGAppMaster)

at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)

- locked <7c2fed728> (a java.lang.Object)

at
org.apache.tez.dag.app.DAGAppMaster$DAGAppMasterShutdownHandler$AMShutdownRunnable.run(DAGAppMaster.java:607)

at java.lang.Thread.run(Thread.java:695)


"AMRM Heartbeater thread" prio=5 tid=7f9a0c0e8800 nid=0x111e70000 waiting
on condition [111e6f000]

   java.lang.Thread.State: TIMED_WAITING (sleeping)

at java.lang.Thread.sleep(Native Method)

at
org.apache.hadoop.util.ThreadUtil.sleepAtLeastIgnoreInterrupts(ThreadUtil.java:43)

at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:150)

at com.sun.proxy.$Proxy9.allocate(Unknown Source)

at
org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.allocate(AMRMClientImpl.java:246)

at
org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$HeartbeatThread.run(AMRMClientAsyncImpl.java:224)

- locked <7c3041e28> (a java.lang.Object)

*public void unregisterApplicationMaster(FinalApplicationStatus appStatus,*

*      String appMessage, String appTrackingUrl) throws YarnException,*

*      IOException {*

*    synchronized (unregisterHeartbeatLock) {*

*      keepRunning = false;*

*      client.unregisterApplicationMaster(appStatus, appMessage,
appTrackingUrl);*

*    }*

*  }*


The line "keepRunning = false" should be outside the synchronized block.

I am not sure this should be regarded as problem in yarn or TEZ. The flag
is private and can't be accessed by Tez implementation TezAMRMClientAsync.


-- 
Cheers,
*Subroto Sanyal*

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message