qpid-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Godfrey" <rob.j.godf...@gmail.com>
Subject Re: [java] Deadlock issue in trunk
Date Mon, 17 Sep 2007 16:20:55 GMT
No - the point is to wrap the whole operation in a single mutex lock

-- Rob

On 17/09/2007, Rupert Smith <rupertlssmith@googlemail.com> wrote:
>
> Couldn't you just have dropped the acquire failover mutex that wraps the
> whole process, and then acquire delivery lock followed by failover mutex
> followed by session close, iterated over all sessions?
>
> Rupert
>
> On 17/09/2007, Robert Godfrey <rob.j.godfrey@gmail.com> wrote:
> >
> > On 17/09/2007, Rupert Smith <rupertlssmith@googlemail.com> wrote:
> > >
> > > Merging that to trunk will fix Rajith's observed deadlock. As I just
> > > volunteered to do some merging, the fix will appear on trunk soon.
> > >
> > > Got to love that recursion till empty, followed by the more
> conventional
> > > iteration of the 'closeAllSessions' method. God help anyone who tries
> to
> > > read this code... but it works.
> >
> >
> >
> > Yeah - unfortunately there isn't an easy way to take out the locks by
> > iterating... so recursion it had to be...  it's not the code I'm most
> > proud
> > of writing, that's for sure :-)
> >
> > -- Rob
> >
> >
> > Rupert
> > >
> > > On 17/09/2007, Robert Godfrey <rob.j.godfrey@gmail.com> wrote:
> > > >
> > > > M2 should have a fix in that gets all the messageDeliveryLocks when
> > you
> > > do
> > > > a
> > > > Connection.close before it gets the mutex...
> > > >
> > > > Specifically, this nasty bit of recursion -
> > > >
> > > > public void close(List<AMQSession> sessions, long timeout) throws
> > > > JMSException
> > > >     {
> > > >         synchronized(_sessionCreationLock)
> > > >         {
> > > >             if(!sessions.isEmpty())
> > > >             {
> > > >                 AMQSession session = sessions.remove(0);
> > > >                 synchronized(session.getMessageDeliveryLock())
> > > >                 {
> > > >                     close(sessions, timeout);
> > > >                 }
> > > >             }
> > > >             else
> > > >
> > > >
> > > >
> > > > ...
> > > >
> > > > -- Rob
> > > >
> > > > On 17/09/2007, Rupert Smith <rupertlssmith@googlemail.com> wrote:
> > > > >
> > > > > M2, Is it significantly different on trunk?
> > > > >
> > > > > On 17/09/2007, Robert Godfrey <rob.j.godfrey@gmail.com> wrote:
> > > > > >
> > > > > > Rupert - which branch of the code are you looking at?
> > > > > >
> > > > > > -- Rob
> > > > > >
> > > > > > On 17/09/2007, Rupert Smith <rupertlssmith@googlemail.com>
> wrote:
> > > > > > >
> > > > > > > Sorry, I'm talking rubbish. The ordering of the locks in
> > > > > > AMQSession.closeis
> > > > > > > fine, and as it should be. Message delivery, followed by
> > failover
> > > > > mutex.
> > > > > > >
> > > > > > > The problematic interaction is because the call to
> > > > AMQConnection.close
> > > > > ,
> > > > > > > obtains the failover mutex first. I'd be tempted to try
and
> > > > eliminate
> > > > > > that
> > > > > > > one, so that as each session is closed delivery then failover
> is
> > > > > > > obtained/released for each.
> > > > > > >
> > > > > > > You could remove the failover mutex from createTextMessage,
> but
> > > > other
> > > > > > > methods that do need to obtain it can be called from within
a
> > > > > > dispathcher
> > > > > > > thread, so the problem can still occur.
> > > > > > >
> > > > > > > Rupert
> > > > > > >
> > > > > > > On 17/09/2007, Rupert Smith <rupertlssmith@googlemail.com>
> > wrote:
> > > > > > > >
> > > > > > > > Also messageDeliveryLock seems like a poor choice
of name.
> > > > > > > > 'sessionCloseMutex' might be better.
> > > > > > > >
> > > > > > > > On 17/09/2007, Rupert Smith < rupertlssmith@googlemail.com>
> > > wrote:
> > > > > > > > >
> > > > > > > > > Actually, I see it now, the message delivery
lock is
> > obtained
> > > in
> > > > > the
> > > > > > > > > dispatcher run method.
> > > > > > > > >
> > > > > > > > > So createTextMessage is doing:
> > > > > > > > >
> > > > > > > > > 1. Obtain message delivery lock.
> > > > > > > > > 2. Obtain failover mutex.
> > > > > > > > >
> > > > > > > > > And the main thread calling close is doing:
> > > > > > > > >
> > > > > > > > > 1. Obtain failover mutex.
> > > > > > > > > 2. Obtain message delivery lock.
> > > > > > > > >
> > > > > > > > > Just swap the lock ordering as I suggested. Unless
this
> > > > conflicts
> > > > > > with
> > > > > > > > > other orderings...
> > > > > > > > >
> > > > > > > > > On 17/09/2007, Rupert Smith <rupertlssmith@googlemail.com>
> > > > wrote:
> > > > > > > > > >
> > > > > > > > > > It looks to me that the message delivery
lock is the
> more
> > > > > > > fundamental
> > > > > > > > > > cause of the dead lock. I can't figure out
from the
> stack
> > > > trace
> > > > > > why
> > > > > > > the
> > > > > > > > > > dispatcher thread is holding that one (I
suppose I
> should
> > > look
> > > > > at
> > > > > > > what the
> > > > > > > > > > test is doing to figure that out).
> > > > > > > > > >
> > > > > > > > > > Can we swap the order of obtaining the locks
in
> > > > AMQSession.close
> > > > > > > (long
> > > > > > > > > > timeout) from
> > > > > > > > > >
> > > > > > > > > > synchronized(_messageDeliveryLock)
> > > > > > > > > > synchronized (_connection.getFailoverMutex())
> > > > > > > > > >
> > > > > > > > > > to
> > > > > > > > > >
> > > > > > > > > > synchronized (_connection.getFailoverMutex())
> > > > > > > > > > synchronized(_messageDeliveryLock)
> > > > > > > > > >
> > > > > > > > > > ?
> > > > > > > > > >
> > > > > > > > > > Unless that causes a different dead lock,
it looks like
> it
> > > > will
> > > > > > fix
> > > > > > > > > > the problem.
> > > > > > > > > >
> > > > > > > > > > Also, it looks to me that removing the failover
mutex
> > > > > acquisition
> > > > > > > from
> > > > > > > > > > createTextMessage will be harmless enough.
> > > > > > > > > >
> > > > > > > > > > Rupert
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On 15/09/2007, Rajith Attapattu < rajith77@gmail.com>
> > wrote:
> > > > > > > > > > >
> > > > > > > > > > > Hi All,
> > > > > > > > > > >
> > > > > > > > > > > When I run the latest version of trunk
I get a hang on
> > > > > > > > > > >
> org.apache.qpid.test.unit.client.forwardall.CombinedTest
> > > > > > > > > > > kill -3 (thread dump) provides information
to say
> there
> > is
> > > a
> > > > > > > > > > > deadlock.
> > > > > > > > > > >
> > > > > > > > > > > I also noted that all
> AMQSession.createXXXMessagecontains
> > > > > > > > > > > synchronization
> > > > > > > > > > > code.
> > > > > > > > > > > (in this case the deadlock happens
inn
> > > > > > > AMQSession.createTextMessagetrying
> > > > > > > > > > > to obtain the failover mutex)
> > > > > > > > > > >
> > > > > > > > > > > Do we really need this synchronization
code when
> > creating
> > > > > > > messages?
> > > > > > > > > > > I am not
> > > > > > > > > > > really convinced it is needed.
> > > > > > > > > > > Can someody explain the reason for
that code block?
> > > > > > > > > > >
> > > > > > > > > > > Regard,
> > > > > > > > > > >
> > > > > > > > > > > Rajith
> > > > > > > > > > >
> > > > > > > > > > > I have attached the trace below.
> > > > > > > > > > >
> > > > > > > > > > > Found one Java-level deadlock:
> > > > > > > > > > > =============================
> > > > > > > > > > > "Dispatcher-Channel-1":
> > > > > > > > > > >   waiting to lock monitor 0x09ff950c
(object
> 0x8b85a1a0,
> > a
> > > > > > > > > > > java.lang.Object
> > > > > > > > > > > ),
> > > > > > > > > > >   which is held by "main"
> > > > > > > > > > > "main":
> > > > > > > > > > >   waiting to lock monitor 0x09ff964c
(object
> 0x88ff0240,
> > a
> > > > > > > > > > > java.lang.Object
> > > > > > > > > > > ),
> > > > > > > > > > >   which is held by "Dispatcher-Channel-1"
> > > > > > > > > > >
> > > > > > > > > > > Java stack information for the threads
listed above:
> > > > > > > > > > > ===================================================
> > > > > > > > > > > "Dispatcher-Channel-1":
> > > > > > > > > > >         at
> > > > org.apache.qpid.client.AMQSession.createTextMessage
> > > > > (
> > > > > > > > > > > AMQSession.java:1057)
> > > > > > > > > > >         - waiting to lock <0x8b85a1a0>
(a
> > java.lang.Object
> > > )
> > > > > > > > > > >         at
> > > > org.apache.qpid.client.AMQSession.createTextMessage(
> > > > > > > > > > > AMQSession.java:1068)
> > > > > > > > > > >         at
> > > > > > > > > > >
> > > > org.apache.qpid.test.unit.client.forwardall.Service.onMessage(
> > > > > > > > > > > Service.java:59)
> > > > > > > > > > >         at
> > > > > > > org.apache.qpid.client.BasicMessageConsumer.notifyMessage
> > > > > > > > > > > (
> > > > > > > > > > > BasicMessageConsumer.java :628)
> > > > > > > > > > >         at
> > > > > > > org.apache.qpid.client.BasicMessageConsumer.notifyMessage
> > > > > > > > > > > (
> > > > > > > > > > > BasicMessageConsumer.java:583)
> > > > > > > > > > >         at
> > > > > > > > > > >
> > > org.apache.qpid.client.AMQSession$Dispatcher.dispatchMessage
> > > > (
> > > > > > > > > > > AMQSession.java:2579)
> > > > > > > > > > >         at
> > > org.apache.qpid.client.AMQSession$Dispatcher.run(
> > > > > > > > > > > AMQSession.java
> > > > > > > > > > > :2502)
> > > > > > > > > > >         - locked <0x88ff0240>
(a java.lang.Object)
> > > > > > > > > > >         - locked <0x88ff0238>
(a java.lang.Object)
> > > > > > > > > > > "main":
> > > > > > > > > > >         at org.apache.qpid.client.AMQSession.close(
> > > > > > AMQSession.java
> > > > > > > > > > > :462)
> > > > > > > > > > >         - waiting to lock <0x88ff0240>
(a
> > java.lang.Object
> > > )
> > > > > > > > > > >         at
> > > > > org.apache.qpid.client.AMQConnection.closeAllSessions
> > > > > > (
> > > > > > > > > > > AMQConnection.java :752)
> > > > > > > > > > >         at org.apache.qpid.client.AMQConnection.close(
> > > > > > > > > > > AMQConnection.java
> > > > > > > > > > > :673)
> > > > > > > > > > >         - locked <0x8b85a1a0>
(a java.lang.Object)
> > > > > > > > > > >         at org.apache.qpid.client.AMQConnection.close(
> > > > > > > > > > > AMQConnection.java
> > > > > > > > > > > :659)
> > > > > > > > > > >         at
> > > > > > > org.apache.qpid.test.unit.client.forwardall.Service.close
> > > > > > > > > > > (
> > > > > > > > > > > Service.java:71)
> > > > > > > > > > >         at
> > > > > > > > > > >
> > > > > >
> org.apache.qpid.test.unit.client.forwardall.ServiceCreator.closeSC
> > > > > > > (
> > > > > > > > > > > ServiceCreator.java:57)
> > > > > > > > > > >         at
> > > > > > > > > > >
> > > > > > >
> > > org.apache.qpid.test.unit.client.forwardall.ServiceCreator.closeAll(
> > > > > > > > > > > ServiceCreator.java:66)
> > > > > > > > > > >         at
> > > > > > > > > > >
> > > > > >
> org.apache.qpid.test.unit.client.forwardall.CombinedTest.tearDown
> > > > > > > > > > > (CombinedTest.java:44)
> > > > > > > > > > >         at junit.framework.TestCase.runBare
(
> > TestCase.java
> > > > > :130)
> > > > > > > > > > >         at junit.framework.TestResult$1.protect(
> > > > > TestResult.java
> > > > > > > :106)
> > > > > > > > > > >         at junit.framework.TestResult.runProtected(
> > > > > > TestResult.java
> > > > > > > > > > > :124)
> > > > > > > > > > >         at junit.framework.TestResult.run(
> > TestResult.java
> > > > :109)
> > > > > > > > > > >         at junit.framework.TestCase.run(TestCase.java
> > :118)
> > > > > > > > > > >         at junit.framework.TestSuite.runTest(
> > > TestSuite.java
> > > > > :208)
> > > > > > > > > > >         at junit.framework.TestSuite.run(
> TestSuite.java
> > > :203)
> > > > > > > > > > >         at junit.extensions.TestDecorator.basicRun
(
> > > > > > > > > > > TestDecorator.java:22)
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message