qpid-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Håkan Johansson (JIRA) <j...@apache.org>
Subject [jira] [Created] (QPID-7051) Crash after reconnect with transactional session (with patch)
Date Mon, 08 Feb 2016 14:23:39 GMT
Håkan Johansson created QPID-7051:

             Summary: Crash after reconnect with transactional session (with patch)
                 Key: QPID-7051
                 URL: https://issues.apache.org/jira/browse/QPID-7051
             Project: Qpid
          Issue Type: Bug
          Components: C++ Client
    Affects Versions: qpid-cpp-0.34
         Environment: Red Hat Enterprise Linux Server release 6.7 (Santiago)

The broker is ActiveMQ 5.13.0.
The protocol used in AMQP 1.0.

            Reporter: Håkan Johansson

I have a test program (see the "consumer.cc" attachment) that creates a connection with "reconnect"
It then creates a transactional session and a receiver to some queue from that session.
It then reads all messages from the queue and prints out their content.
A sleep is used between each read to make the test possible.

While the broker is down the program will try to reconnect to it.
As soon as it succeeds with that the fetch call throws an exception because the transaction
has become invalid.
The exception is caught and the read loop is broken out of.
The test function then exits, causing the _Receiver_, _Session_, and _Connection_ objects
to be destructed.

The crash happens while destructing the _Connection_ object.

It took some digging, but I managed to find the reason for the crash.
When the _Connection_ object is destructed it automatically destructs its _ConnectionHandle_
object, which in turn destructs its _ConnectionContext_ object. Nothing strange here.
The _ConnectionContext_ destructor makes a call to its own _close_ method, which tries to
shut down all its sessions.

The problem is that the session has been made invalid by the disconnect, which causes the
call to _syncLH_ to throw an exception,
which is not caught anywhere, indirectly causing the _ConnectionContext_ destructor to throw
an exception. This is a big no-no in C++.

A side effect of this is that the transport object is not closed before it is destructed,
which means that it is still listening for events. The crash happens when the next pending
event tries to use
the destructed transport object.

The solution, in my humble opinion, is to catch the exception throws by the _syncLH_ call
in the _ConnectionContext::close_ method.
This way we can try to close all sessions even if one or more of them are invalidated for
some reason.
The rest of the cleanup process will also be done properly.

How to run the test program:
* Compile both "producer.cc" and "consumer.cc". They both need to be linked to the "qpidmessaging"
* Run "producer" once. This will add ten messages to the "apa.bepa" queue on the broker.
* Start "consumer".
* When the consumer starts to print out the messages, shut down and restart the broker.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org

View raw message