qpid-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "jiraposter@reviews.apache.org (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (QPID-3759) Heartbeat timeout in Windows does not lead to timely reconnect
Date Mon, 26 Mar 2012 18:28:28 GMT

    [ https://issues.apache.org/jira/browse/QPID-3759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13238653#comment-13238653

jiraposter@reviews.apache.org commented on QPID-3759:

This is an automatically generated e-mail. To reply, visit:

(Updated 2012-03-26 18:26:34.827957)

Review request for qpid, Andrew Stitcher, Ted Ross, Chug Rolke, and Steve Huston.


This patch follows the same logic of the previous while avoiding CancelIoEx.

CancelIo as a substitution for CancelIoEx was considered but has thread restrictions that
would have required a major rewrite of the base code.

I have substituted a much blunter instrument to achieve the completion, namely a full closesocket
to unstick the read.  It forces all pending overlapped operations to completions, which is
the last read in our case.


The cause of the hang was an outstanding read side completion when the AsynchIO object in
charge of the socket was in the queuedClose state.

The completion handler drains outstanding async requests before closing the socket.  Since
the cable had been pulled, the async read would never complete until Windows gave up on the
socket altogether (some time much later).

This patch remembers the last aio read and will cancel it  if in the queuedClose state before
blocking again.

Aside from the basic description from the Jira, I also removed an unused test for restartRead,
which doesn't change the logic of the section, but may indicate an intention that wasn't fully
coded or something left over from a previous change.

This addresses bug QPID-3759.

Diffs (updated)

  http://svn.apache.org/repos/asf/qpid/trunk/qpid/cpp/src/qpid/sys/windows/AsynchIO.cpp 1301636

Diff: https://reviews.apache.org/r/4383/diff


qpid-perftest, qpid-send, qpid-receive, cable pulls, broker pause/resumes



> Heartbeat timeout in Windows does not lead to timely reconnect
> --------------------------------------------------------------
>                 Key: QPID-3759
>                 URL: https://issues.apache.org/jira/browse/QPID-3759
>             Project: Qpid
>          Issue Type: Bug
>          Components: C++ Client
>    Affects Versions: 0.14
>         Environment: Windows C++ messaging
>            Reporter: Chuck Rolke
>            Assignee: Cliff Jansen
>             Fix For: 0.17
>         Attachments: main.cpp
> Reported by Wolf Wolfswinkel on Qpid users http://qpid.2158936.n2.nabble.com/Heartbeats-in-C-broker-on-Windows-td7118702.html
> The simplest test case is in attached main.cpp. Establish a good network connection to
the broker and then start the program. It creates a connection, sends two messages, and then
pauses for 15 seconds. During the pause disconnect the network connection to the broker for
at least two heartbeat timeouts (12 seconds).
> After the heartbeat timeout the timer task fires and a debug trace shows:
>  Traffic timeout,  TCPConnector::abort, TCPConnector::eof, TCPConnector::close
> But the connection is not actually closed until something happens on the network to wake
up the thread waiting in Poller::run().
> The timer event appears unable to interrupt the IO thread waiting for the completion

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org

View raw message