tez-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Subroto Sanyal <sanyalsubr...@gmail.com>
Subject Re: Deadlock in DAGAppMaster during shutdown.
Date Tue, 10 Jun 2014 11:54:23 GMT
Hi Oleg,


Thanks for confirming. Could you please provide the TEZ jira tickets for
both of the issue where they have been solved.
I couldn't find the code changes for closing TezClient.


On Tue, Jun 10, 2014 at 1:25 PM, Oleg Zhurakousky <
ozhurakousky@hortonworks.com> wrote:

> Subroto
>
> Thanks for pointing this out.
> This and the TezClient issue you’ve pointed out in your previous email is
> actually being actively addressed
>
> Oleg
>
> On Jun 10, 2014, at 5:42 AM, Subroto Sanyal <sanyalsubroto@gmail.com>
> wrote:
>
> > In the class AMRMClientAsyncImpl the object(7c3041e28) is being locked by
> > Heartbeat thread(which kinds of run a infinite loop as any heartbeat
> > thread) which is requested to be locked by the method
> > unregisterApplicationMaster.
> >
> > Once the method unregisterApplicationMaster can lock the requested
> object;
> > then only it can notify the heartbeat thread to exit by a boolean flag
> > keepRunning.
> >
> > Following is the thread-dump for the deadlock:
> >
> > "AMShutdownThread" daemon prio=5 tid=7f9a02921800 nid=0x115d68000 waiting
> > for monitor entry [115d67000]
> >
> >   java.lang.Thread.State: BLOCKED (on object monitor)
> >
> > at
> >
> org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl.unregisterApplicationMaster(AMRMClientAsyncImpl.java:156)
> >
> > - waiting to lock <7c3041e28> (a java.lang.Object)
> >
> > at
> >
> org.apache.tez.dag.app.rm.TaskScheduler.serviceStop(TaskScheduler.java:394)
> >
> > - locked <7c3006aa0> (a org.apache.tez.dag.app.rm.TaskScheduler)
> >
> > at
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
> >
> > - locked <7c3038008> (a java.lang.Object)
> >
> > at
> >
> org.apache.tez.dag.app.rm.TaskSchedulerEventHandler.serviceStop(TaskSchedulerEventHandler.java:357)
> >
> > at
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
> >
> > - locked <7c2f71360> (a java.lang.Object)
> >
> > at
> >
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
> >
> > at
> >
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
> >
> > at
> org.apache.tez.dag.app.DAGAppMaster.stopServices(DAGAppMaster.java:1518)
> >
> > at
> org.apache.tez.dag.app.DAGAppMaster.serviceStop(DAGAppMaster.java:1649)
> >
> > - locked <7c2f51790> (a org.apache.tez.dag.app.DAGAppMaster)
> >
> > at
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
> >
> > - locked <7c2fed728> (a java.lang.Object)
> >
> > at
> >
> org.apache.tez.dag.app.DAGAppMaster$DAGAppMasterShutdownHandler$AMShutdownRunnable.run(DAGAppMaster.java:607)
> >
> > at java.lang.Thread.run(Thread.java:695)
> >
> >
> > "AMRM Heartbeater thread" prio=5 tid=7f9a0c0e8800 nid=0x111e70000 waiting
> > on condition [111e6f000]
> >
> >   java.lang.Thread.State: TIMED_WAITING (sleeping)
> >
> > at java.lang.Thread.sleep(Native Method)
> >
> > at
> >
> org.apache.hadoop.util.ThreadUtil.sleepAtLeastIgnoreInterrupts(ThreadUtil.java:43)
> >
> > at
> >
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:150)
> >
> > at com.sun.proxy.$Proxy9.allocate(Unknown Source)
> >
> > at
> >
> org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.allocate(AMRMClientImpl.java:246)
> >
> > at
> >
> org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$HeartbeatThread.run(AMRMClientAsyncImpl.java:224)
> >
> > - locked <7c3041e28> (a java.lang.Object)
> >
> > *public void unregisterApplicationMaster(FinalApplicationStatus
> appStatus,*
> >
> > *      String appMessage, String appTrackingUrl) throws YarnException,*
> >
> > *      IOException {*
> >
> > *    synchronized (unregisterHeartbeatLock) {*
> >
> > *      keepRunning = false;*
> >
> > *      client.unregisterApplicationMaster(appStatus, appMessage,
> > appTrackingUrl);*
> >
> > *    }*
> >
> > *  }*
> >
> >
> > The line "keepRunning = false" should be outside the synchronized block.
> >
> > I am not sure this should be regarded as problem in yarn or TEZ. The flag
> > is private and can't be accessed by Tez implementation
> TezAMRMClientAsync.
> >
> >
> > --
> > Cheers,
> > *Subroto Sanyal*
>
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>



-- 
Cheers,
*Subroto Sanyal*

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message