qpid-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rupert Smith" <rupertlssm...@googlemail.com>
Subject Re: [java] Deadlock issue in trunk
Date Mon, 17 Sep 2007 15:57:12 GMT
Merging that to trunk will fix Rajith's observed deadlock. As I just
volunteered to do some merging, the fix will appear on trunk soon.

Got to love that recursion till empty, followed by the more conventional
iteration of the 'closeAllSessions' method. God help anyone who tries to
read this code... but it works.

Rupert

On 17/09/2007, Robert Godfrey <rob.j.godfrey@gmail.com> wrote:
>
> M2 should have a fix in that gets all the messageDeliveryLocks when you do
> a
> Connection.close before it gets the mutex...
>
> Specifically, this nasty bit of recursion -
>
> public void close(List<AMQSession> sessions, long timeout) throws
> JMSException
>     {
>         synchronized(_sessionCreationLock)
>         {
>             if(!sessions.isEmpty())
>             {
>                 AMQSession session = sessions.remove(0);
>                 synchronized(session.getMessageDeliveryLock())
>                 {
>                     close(sessions, timeout);
>                 }
>             }
>             else
>
>
>
> ...
>
> -- Rob
>
> On 17/09/2007, Rupert Smith <rupertlssmith@googlemail.com> wrote:
> >
> > M2, Is it significantly different on trunk?
> >
> > On 17/09/2007, Robert Godfrey <rob.j.godfrey@gmail.com> wrote:
> > >
> > > Rupert - which branch of the code are you looking at?
> > >
> > > -- Rob
> > >
> > > On 17/09/2007, Rupert Smith <rupertlssmith@googlemail.com> wrote:
> > > >
> > > > Sorry, I'm talking rubbish. The ordering of the locks in
> > > AMQSession.closeis
> > > > fine, and as it should be. Message delivery, followed by failover
> > mutex.
> > > >
> > > > The problematic interaction is because the call to
> AMQConnection.close
> > ,
> > > > obtains the failover mutex first. I'd be tempted to try and
> eliminate
> > > that
> > > > one, so that as each session is closed delivery then failover is
> > > > obtained/released for each.
> > > >
> > > > You could remove the failover mutex from createTextMessage, but
> other
> > > > methods that do need to obtain it can be called from within a
> > > dispathcher
> > > > thread, so the problem can still occur.
> > > >
> > > > Rupert
> > > >
> > > > On 17/09/2007, Rupert Smith <rupertlssmith@googlemail.com> wrote:
> > > > >
> > > > > Also messageDeliveryLock seems like a poor choice of name.
> > > > > 'sessionCloseMutex' might be better.
> > > > >
> > > > > On 17/09/2007, Rupert Smith < rupertlssmith@googlemail.com>
wrote:
> > > > > >
> > > > > > Actually, I see it now, the message delivery lock is obtained
in
> > the
> > > > > > dispatcher run method.
> > > > > >
> > > > > > So createTextMessage is doing:
> > > > > >
> > > > > > 1. Obtain message delivery lock.
> > > > > > 2. Obtain failover mutex.
> > > > > >
> > > > > > And the main thread calling close is doing:
> > > > > >
> > > > > > 1. Obtain failover mutex.
> > > > > > 2. Obtain message delivery lock.
> > > > > >
> > > > > > Just swap the lock ordering as I suggested. Unless this
> conflicts
> > > with
> > > > > > other orderings...
> > > > > >
> > > > > > On 17/09/2007, Rupert Smith <rupertlssmith@googlemail.com>
> wrote:
> > > > > > >
> > > > > > > It looks to me that the message delivery lock is the more
> > > > fundamental
> > > > > > > cause of the dead lock. I can't figure out from the stack
> trace
> > > why
> > > > the
> > > > > > > dispatcher thread is holding that one (I suppose I should
look
> > at
> > > > what the
> > > > > > > test is doing to figure that out).
> > > > > > >
> > > > > > > Can we swap the order of obtaining the locks in
> AMQSession.close
> > > > (long
> > > > > > > timeout) from
> > > > > > >
> > > > > > > synchronized(_messageDeliveryLock)
> > > > > > > synchronized (_connection.getFailoverMutex())
> > > > > > >
> > > > > > > to
> > > > > > >
> > > > > > > synchronized (_connection.getFailoverMutex())
> > > > > > > synchronized(_messageDeliveryLock)
> > > > > > >
> > > > > > > ?
> > > > > > >
> > > > > > > Unless that causes a different dead lock, it looks like
it
> will
> > > fix
> > > > > > > the problem.
> > > > > > >
> > > > > > > Also, it looks to me that removing the failover mutex
> > acquisition
> > > > from
> > > > > > > createTextMessage will be harmless enough.
> > > > > > >
> > > > > > > Rupert
> > > > > > >
> > > > > > >
> > > > > > > On 15/09/2007, Rajith Attapattu < rajith77@gmail.com>
wrote:
> > > > > > > >
> > > > > > > > Hi All,
> > > > > > > >
> > > > > > > > When I run the latest version of trunk I get a hang
on
> > > > > > > > org.apache.qpid.test.unit.client.forwardall.CombinedTest
> > > > > > > > kill -3 (thread dump) provides information to say
there is a
> > > > > > > > deadlock.
> > > > > > > >
> > > > > > > > I also noted that all AMQSession.createXXXMessage
contains
> > > > > > > > synchronization
> > > > > > > > code.
> > > > > > > > (in this case the deadlock happens inn
> > > > AMQSession.createTextMessagetrying
> > > > > > > > to obtain the failover mutex)
> > > > > > > >
> > > > > > > > Do we really need this synchronization code when creating
> > > > messages?
> > > > > > > > I am not
> > > > > > > > really convinced it is needed.
> > > > > > > > Can someody explain the reason for that code block?
> > > > > > > >
> > > > > > > > Regard,
> > > > > > > >
> > > > > > > > Rajith
> > > > > > > >
> > > > > > > > I have attached the trace below.
> > > > > > > >
> > > > > > > > Found one Java-level deadlock:
> > > > > > > > =============================
> > > > > > > > "Dispatcher-Channel-1":
> > > > > > > >   waiting to lock monitor 0x09ff950c (object 0x8b85a1a0,
a
> > > > > > > > java.lang.Object
> > > > > > > > ),
> > > > > > > >   which is held by "main"
> > > > > > > > "main":
> > > > > > > >   waiting to lock monitor 0x09ff964c (object 0x88ff0240,
a
> > > > > > > > java.lang.Object
> > > > > > > > ),
> > > > > > > >   which is held by "Dispatcher-Channel-1"
> > > > > > > >
> > > > > > > > Java stack information for the threads listed above:
> > > > > > > > ===================================================
> > > > > > > > "Dispatcher-Channel-1":
> > > > > > > >         at
> org.apache.qpid.client.AMQSession.createTextMessage
> > (
> > > > > > > > AMQSession.java:1057)
> > > > > > > >         - waiting to lock <0x8b85a1a0> (a java.lang.Object)
> > > > > > > >         at
> org.apache.qpid.client.AMQSession.createTextMessage(
> > > > > > > > AMQSession.java:1068)
> > > > > > > >         at
> > > > > > > >
> org.apache.qpid.test.unit.client.forwardall.Service.onMessage(
> > > > > > > > Service.java:59)
> > > > > > > >         at
> > > > org.apache.qpid.client.BasicMessageConsumer.notifyMessage
> > > > > > > > (
> > > > > > > > BasicMessageConsumer.java :628)
> > > > > > > >         at
> > > > org.apache.qpid.client.BasicMessageConsumer.notifyMessage
> > > > > > > > (
> > > > > > > > BasicMessageConsumer.java:583)
> > > > > > > >         at
> > > > > > > > org.apache.qpid.client.AMQSession$Dispatcher.dispatchMessage
> (
> > > > > > > > AMQSession.java:2579)
> > > > > > > >         at org.apache.qpid.client.AMQSession$Dispatcher.run(
> > > > > > > > AMQSession.java
> > > > > > > > :2502)
> > > > > > > >         - locked <0x88ff0240> (a java.lang.Object)
> > > > > > > >         - locked <0x88ff0238> (a java.lang.Object)
> > > > > > > > "main":
> > > > > > > >         at org.apache.qpid.client.AMQSession.close(
> > > AMQSession.java
> > > > > > > > :462)
> > > > > > > >         - waiting to lock <0x88ff0240> (a java.lang.Object)
> > > > > > > >         at
> > org.apache.qpid.client.AMQConnection.closeAllSessions
> > > (
> > > > > > > > AMQConnection.java :752)
> > > > > > > >         at org.apache.qpid.client.AMQConnection.close(
> > > > > > > > AMQConnection.java
> > > > > > > > :673)
> > > > > > > >         - locked <0x8b85a1a0> (a java.lang.Object)
> > > > > > > >         at org.apache.qpid.client.AMQConnection.close(
> > > > > > > > AMQConnection.java
> > > > > > > > :659)
> > > > > > > >         at
> > > > org.apache.qpid.test.unit.client.forwardall.Service.close
> > > > > > > > (
> > > > > > > > Service.java:71)
> > > > > > > >         at
> > > > > > > >
> > > org.apache.qpid.test.unit.client.forwardall.ServiceCreator.closeSC
> > > > (
> > > > > > > > ServiceCreator.java:57)
> > > > > > > >         at
> > > > > > > >
> > > > org.apache.qpid.test.unit.client.forwardall.ServiceCreator.closeAll(
> > > > > > > > ServiceCreator.java:66)
> > > > > > > >         at
> > > > > > > >
> > > org.apache.qpid.test.unit.client.forwardall.CombinedTest.tearDown
> > > > > > > > (CombinedTest.java:44)
> > > > > > > >         at junit.framework.TestCase.runBare (TestCase.java
> > :130)
> > > > > > > >         at junit.framework.TestResult$1.protect(
> > TestResult.java
> > > > :106)
> > > > > > > >         at junit.framework.TestResult.runProtected(
> > > TestResult.java
> > > > > > > > :124)
> > > > > > > >         at junit.framework.TestResult.run(TestResult.java
> :109)
> > > > > > > >         at junit.framework.TestCase.run(TestCase.java:118)
> > > > > > > >         at junit.framework.TestSuite.runTest(TestSuite.java
> > :208)
> > > > > > > >         at junit.framework.TestSuite.run(TestSuite.java:203)
> > > > > > > >         at junit.extensions.TestDecorator.basicRun
(
> > > > > > > > TestDecorator.java:22)
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message