qpid-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark Moseley (JIRA)" <j...@apache.org>
Subject [jira] Commented: (QPID-2992) Cluster failing to resurrect durable static route depending on order of shutdown
Date Tue, 11 Jan 2011 01:10:46 GMT

    [ https://issues.apache.org/jira/browse/QPID-2992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12979891#action_12979891
] 

Mark Moseley commented on QPID-2992:
------------------------------------

I also rewrote the script to do a B1->B2->B2->B1 shutdown/startup sequence first
(the binding was visible after that), then do a B2->B1->B1->B2 stop/start and the
binding wasn't there. Maybe it get s a single freebie in a super clean cluster?

I had originally posted to the list since I figured I was probably doing something wrong,
so there could be some conceptual problem on my part, i.e. maybe it's not supposed to work
like I'm expecting.

> Cluster failing to resurrect durable static route depending on order of shutdown
> --------------------------------------------------------------------------------
>
>                 Key: QPID-2992
>                 URL: https://issues.apache.org/jira/browse/QPID-2992
>             Project: Qpid
>          Issue Type: Bug
>          Components: C++ Broker, C++ Clustering
>    Affects Versions: 0.8
>         Environment: Debian Linux Squeeze, 32-bit, kernel 2.6.36.2, Dell Poweredge 1950s.
Corosync==1.3.0, Openais==1.1.4
>            Reporter: Mark Moseley
>            Assignee: Alan Conway
>         Attachments: cluster-fed.sh, error
>
>
> I've got a 2-node qpid test cluster at each of 2 datacenters, which are federated together
with a single durable static route between each. Qpid is version 0.8. Corosync and openais
are stock Squeeze (1.2.1-3 and 1.1.2-2, respectively). OS is Squeeze, 32-bit, on Dell Poweredge
1950s, kernel 2.6.36. The static route is durable and is set up over SSL (but I can replicate
as well with non-SSL). I've tried to normalize the hostnames below to make things clearer;
hopefully I didn't mess anything up.
> Given two clusters, cluster A (consisting of hosts A1 and A2) and cluster B (with B1
and B2), I've got a static exchange route from A1 to B1, as well as another from B1 to A1.
Federation is working correctly, so I can send a message on A2 and have it successfully retrieved
on B2. The exchange local to cluster A is walmyex1; the local exchange for B is bosmyex1.
> If I shut down the cluster in this order: B2, then B1, and start back up with B1, B2,
the static route route fails to get recreated. That is, on A1/A2, looking at the bindings,
exchange 'bosmyex1' does not get re-bound to cluster B; the only output for it in "qpid-config
exchanges --bindings" is just:
> <snip>
> Exchange 'bosmyex1' (direct)
> </snip>
> If however I shut the cluster down in this order: B1, then B2, and start B2, then B1,
the static route gets re-bound. The output then is:
> <snip>
> Exchange 'bosmyex1' (direct)
>     bind [unix.boston.cust] => bridge_queue_1_8870523d-2286-408e-b5b5-50d53db2fa61
> </bind>
> and I can message over the federated link with no further modification. Prior to a few
minutes ago, I was seeing this with the Squeeze stock openais==1.1.2 and corosync==1.2.1.
In debugging this, I've upgraded both to the latest versions with no change.
> I can replicate this every time I try. These are just test clusters, so I don't have
any other activity going on on them, or any other exchanges/queues. My steps:
> On all boxes in cluster A and B:
> * Kill the qpidd if it's running and delete all existing store files, i.e. contents of
/var/lib/qpid/
> On host A1 in cluster A (I'm leaving out the -a user/test@host stuff):
> * Start up qpid
> * qpid-config add exchange direct bosmyex1 --durable
> * qpid-config add exchange direct walmyex1 --durable
> * qpid-config add queue walmyq1 --durable
> * qpid-config bind walmyex1 walmyq1 unix.waltham.cust
> On host B1 in cluster B:
> * qpid-config add exchange direct bosmyex1 --durable
> * qpid-config add exchange direct walmyex1 --durable
> * qpid-config add queue bosmyq1 --durable
> * qpid-config bind bosmyex1 bosmyq1 unix.boston.cust
> On cluster A:
> * Start other member of cluster, A2
> * qpid-route route add amqps://user/pass@HOSTA1:5671 amqps://user/pass@HOSTB1:5671 walmyex1
unix.waltham.cust -d
> On cluster B:
> * Start other member of cluster, B2
> * qpid-route route add amqps://user/pass@HOSTB1:5671 amqps://user/pass@HOSTA1:5671 bosmyex1
unix.boston.cust -d
> On either cluster:
> * Check "qpid-config exchanges --bindings" to make sure bindings are correct for remote
exchanges
> * To see correct behaviour, stop cluster in the order B1->B2, or A1->A2, start
cluster back up, check bindings.
> * To see broken behaviour, stop cluster in the order B2->B1, or A2->A1, start cluster
back up, check bindings.
> This is a test cluster, so I'm free to do anything with it, debugging-wise, that would
be useful. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:dev-subscribe@qpid.apache.org


Mime
View raw message