qpid-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alan Conway (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (QPID-4394) HA replication hangs when QMF events arrive out of order
Date Wed, 24 Oct 2012 20:36:12 GMT

    [ https://issues.apache.org/jira/browse/QPID-4394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13483568#comment-13483568
] 

Alan Conway commented on QPID-4394:
-----------------------------------

We don't have a stand-alone reproducer of this, but by inspection it's clear that QMF events
can be send in the wrong order.

This is not just a HA problem: any qmf client can receive mis-ordered events.
                
> HA replication hangs when QMF events arrive out of order
> --------------------------------------------------------
>
>                 Key: QPID-4394
>                 URL: https://issues.apache.org/jira/browse/QPID-4394
>             Project: Qpid
>          Issue Type: Bug
>          Components: C++ Clustering
>    Affects Versions: 0.19
>            Reporter: Alan Conway
>            Assignee: Alan Conway
>
> With the new replication-based clustering in 0.18 MRG-M, it is possible for the replication
to hang if the QMF events arrive in the wrong order.  I am running the following test that
generates the hanging:
> - Start a client with 2 threads
> - Each thread creates its own Connection, Session, and a Receiver using the address "someQueue;
{create:always, node: {x-declare: {auto-delete:True}}}"
> - Run a loop like this (pseudocode):
> while(receiver.get(message)) {
>   // do stuff
>   if at least 5 seconds have passed {
>     connection.close();
>     reconnectAndRecreateReceiver();
>     receiver.setCapacity(1000);
>   }
> }
> During this loop, the 2 threads will disconnect and reconnect every 5 seconds.  When
connecting, 1 of them will create a queue.  When disconnecting, the queue will be deleted.
 At some point, the queue creation event will possibly arrive at the backup broker before
the queue deletion event (i.e. in the wrong order) because there is no lock that governs when
queue creation/deletion events are emitted.  When this happens, the backup broker doesn't
subscribe to the primary to replicate the queue in question, and things hang.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org


Mime
View raw message