qpid-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gordon Sim" <g...@redhat.com>
Subject Re: Review Request 15677: Windows client error closing AMQP 1.0 connection
Date Wed, 20 Nov 2013 10:58:55 GMT


> On Nov. 19, 2013, 8:08 p.m., Andrew Stitcher wrote:
> > I don't think the locks can deadlock, so from that point of view they are ok theoretically.
> > 
> > But there are some very important open questions in my mind:
> > 1. Why there are multiple callers of close and is there a good reason for it.
> > 2. Why don't we see crashes under Linux too, and does this fix work there?
> > 3. Also this looks like cut'n'paste programming from code elsewhere in the tree
- are you sure that the original doesn't also have the bug (and if not why not?)

1. I can see two hypothetical reasons for the problem. The first is that the disconnected
callback is called by the windows IO layer when the remote peer - i.e. the broker - closes
the socket, whereas on linux the eof callback is called. If, as seems to be the case for linux,
a disconnected callback is always followed by a closed callback, then socketClosed(), and
therefore queueForDeletion() would be called twice. The second possibility is disconnection
by the remote peer concurrent with a close request from the application. Certainly the code
as it is now (prior to this patch) has a race condition in it that doesn't handle that concurrency
adequately. The patch here addresses both of these hypothetical problems. Some extra tracing
could shed some light in which (if either) of these is happening.

2. The fix does work on linux. My guess as to why we don't see the same crashes is due to
differences in the underlying AsynchIO implementations. Perhaps its as simple as timing related.
It does seem that on linux the disconnected event isn't called, at least in the common cases.
Perhaps the posix implementation is more tolerant of double calls to queueForDeletion()? 

3. The code here is largely copied from qpid::client::TcpConnector, differing mainly in the
encode and decode. There is some other locking in that class that would I think prevent the
race. The same concerns around the disconnected callback handler apply, but I'm wondering
whether that is only ever actually called in response to abort()?.


- Gordon


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15677/#review29132
-----------------------------------------------------------


On Nov. 19, 2013, 7:53 p.m., Chug Rolke wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/15677/
> -----------------------------------------------------------
> 
> (Updated Nov. 19, 2013, 7:53 p.m.)
> 
> 
> Review request for qpid, Andrew Stitcher and Steve Huston.
> 
> 
> Bugs: QPID-5363
>     https://issues.apache.org/jira/browse/QPID-5363
> 
> 
> Repository: qpid
> 
> 
> Description
> -------
> 
> Windows fails with an access violation about 10% of the time while closing AMQP 1.0 connections:
> 
> STACK
> =====
> >        qpidmessagingd.dll!qpid::messaging::amqp::TcpTransport::close()  Line 123
+ 0x10 bytes        C++
>          qpidmessagingd.dll!qpid::messaging::amqp::ConnectionContext::close()  Line 134
+ 0x29 bytes        C++
>          qpidmessagingd.dll!qpid::messaging::amqp::ConnectionHandle::close()  Line 62
       C++
>          qpidmessagingd.dll!qpid::messaging::Connection::close()  Line 78 + 0x24 bytes
       C++
>          hello_world.exe!main(int argc=4, char * * argv=0x0053a688)  Line 51 + 0xe bytes
       C++
>          hello_world.exe!__tmainCRTStartup()  Line 586 + 0x19 bytes        C
>          hello_world.exe!mainCRTStartup()  Line 403        C
> 
> CODE
> ====
> void TcpTransport::close()
> {
>     QPID_LOG(debug, id << " TcpTransport closing...");
>     if (aio)
>         aio->queueWriteClose();    <=======
> }
> 
> FAILURE
> =======
> aio's vftable is all 0xdddddddd, indicating that it has been deleted.
> 
> THE FIX
> =======
> Use some locks to protect closing the aio object.
> 
> DISCUSSION
> ==========
> The locks in the diff appear to work ok in that the code passes thousands of runs. Are
they OK theoretically?
> 
> Include in 0.26 release
> -----------------------
> Please indicate your approval or not in QPID-5363.
> 
> 
> Diffs
> -----
> 
>   trunk/qpid/cpp/src/qpid/messaging/amqp/TcpTransport.h 1543532 
>   trunk/qpid/cpp/src/qpid/messaging/amqp/TcpTransport.cpp 1543532 
> 
> Diff: https://reviews.apache.org/r/15677/diff/
> 
> 
> Testing
> -------
> 
> Running HelloWorld in a loop in multiple windows works ok.
> 
> Passes unit tests
> 
> 
> Thanks,
> 
> Chug Rolke
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message