qpid-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alan Conway" <acon...@redhat.com>
Subject Re: Review Request 20625: QPID-5719: HA becomes unresponsive once any of the brokers are SIGSTOPed
Date Thu, 24 Apr 2014 16:13:16 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/20625/
-----------------------------------------------------------

(Updated April 24, 2014, 4:13 p.m.)


Review request for qpid, Gordon Sim and Kenneth Giusti.


Changes
-------

Fixed for Ken's comments. With --config qpidd.conf the settings in qpidd.conf are used as
defaults, they don't override settings from the command line. Note we default different parts
of the broker URL separately, so e.g. if a host but no port is specified we use the port from
qpidd.conf, if no user/pass is specified we use qpidd.conf.


Bugs: QPID-5719
    https://issues.apache.org/jira/browse/QPID-5719


Repository: qpid


Description
-------

QPID-5719: HA becomes unresponsive once any of the brokers are SIGSTOPed

- Added timeout to qpid-ha.
- qpidd init script pings broker to verify it is not hung.
- updated documentation in qpid/doc/book/src/cpp-broker/Active-Passive-Cluster.xml.

The new results for the cases mentioned in the bug:

a] stopped ALL brokers: rgmanager restarts the entire cluster but data is lost.
   Equivalent to killing all the  brokers at once. This does not affect quorum because
   only qpidd services are affected, not other services managed by cman.

b] stopped the primary: rgmanager restarts the primary after a timeout and promotes one of
the backups.

c] stopped a backup: rgmanager restarts the backups after a timeout.
   Clients that are actively sending messages may see a delay while backup is restarted.

Note you need to set link-heartbeat-interval in qpidd.conf. The default is very
high (120 seconds), it should be set lower to see recovery from sigstop in a
reasonable time.
See the updated documentation in qpid/doc/book/src/cpp-broker/Active-Passive-Cluster.xml.


Diffs (updated)
-----

  /trunk/qpid/cpp/etc/qpidd-primary.in 1589403 
  /trunk/qpid/cpp/etc/qpidd.in 1589403 
  /trunk/qpid/cpp/src/tests/ha_test.py 1589403 
  /trunk/qpid/cpp/src/tests/ha_tests.py 1589403 
  /trunk/qpid/doc/book/src/cpp-broker/Active-Passive-Cluster.xml 1589403 
  /trunk/qpid/tools/src/py/qpid-ha 1589403 

Diff: https://reviews.apache.org/r/20625/diff/


Testing
-------

Tested with 3 node cman cluster, passes full ctest.


Thanks,

Alan Conway


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message