qpid-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark Moseley (JIRA)" <j...@apache.org>
Subject [jira] Created: (QPID-2993) Federated source-local links crash remotely federated cluster member on local cluster startup
Date Sat, 08 Jan 2011 00:21:46 GMT
Federated source-local links crash remotely federated cluster member on local cluster startup
---------------------------------------------------------------------------------------------

                 Key: QPID-2993
                 URL: https://issues.apache.org/jira/browse/QPID-2993
             Project: Qpid
          Issue Type: Bug
          Components: C++ Broker, C++ Clustering
    Affects Versions: 0.8
         Environment: Debian Linux Squeeze, 32-bit, kernel 2.6.36.2, Dell Poweredge 1950s.
Corosync==1.3.0, Openais==1.1.4
            Reporter: Mark Moseley


This is related to JIRA 2992 that I opened, but this is for source-local routes. Given the
same setup as in JIRA 2992 but using source-local routes (and obviously with the exchanges
switched accordingly in the qpid-route statements), i.e. cluster A and cluster B with the
routes between A1<->B1, when cluster B shuts down in the order B2->B1 and starts
back up, the static routes are not correctly re-bound on cluster A's side. However if cluster
B is shut down in the order B1->B2 and started back up, the route is correctly created
and works. However in the non-functioning case (B2->B1, or A2->A1), there is an additional
side-effect: on node A2, qpidd crashes with the following error (cluster A is called 'walclust',
B is bosclust):

2011-01-07 18:57:35 error Channel exception: not-attached: Channel 1 is not attached (qpid/amqp_0_10/SessionHandler.cpp:39)
2011-01-07 18:57:35 critical cluster(102.0.0.0:13650 READY/error) local error 2030 did not
occur on member 101.0.0.0:9920: not-attached: Channel 1 is not attached (qpid/amqp_0_10/SessionHandler.cpp:39)
2011-01-07 18:57:35 critical Error delivering frames: local error did not occur on all cluster
members : not-attached: Channel 1 is not attached (qpid/amqp_0_10/SessionHandler.cpp:39) (qpid/cluster/ErrorCheck.cpp:89)
2011-01-07 18:57:35 notice cluster(102.0.0.0:13650 LEFT/error) leaving cluster walclust
2011-01-07 18:57:35 notice Shut down

This happens on both sides of the cluster, so it's not limited to one or the other. This crash
does *not* occur in the A1->A2/B1->B2 test (i.e. the test where the route is re-bound
correctly). I can cause this to reoccur pretty much every time. I've been resetting the cluster
completely to a new state between each test. Occasionally in the B2->B1 test, A1 will also
crash with the same error (and vice versa for A2->A1 for node B1), though most of the time,
it's A2/B2 that crashes.

I was getting this same behaviour prior to upgrading corosync/openais as well. Previously
I was using the stock Squeeze versions of corosync==1.2.1 and openais==1.1.2. The results
are the same with corosync=1.3.0 and openais==1.1.4.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:dev-subscribe@qpid.apache.org


Mime
View raw message