qpid-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alan Conway" <acon...@redhat.com>
Subject Re: Review Request 20625: QPID-5719: HA becomes unresponsive once any of the brokers are SIGSTOPed
Date Thu, 24 Apr 2014 12:55:15 GMT


> On April 24, 2014, 12:43 p.m., Kenneth Giusti wrote:
> > /trunk/qpid/tools/src/py/qpid-ha, line 71
> > <https://reviews.apache.org/r/20625/diff/1/?file=566017#file566017line71>
> >
> >     Does this silently override the value supplied by the user?  Should this only
be done if the user -hasn't- explicitly set --timeout?
> >     
> >     I guess I don't understand why this is done.

I introduce the --config option as a fix for the init scripts which always want to talk to
localhost with the configuration taken from qpidd.conf, so I ignore all the user settings
if --config is passed - e.g. I always set broker=localhost as well. I was going to be lazy
and leave it like that on the assumption that --config is only going to be used by those scripts.

However you are right, it would be better to let user args take precedence, then --config
would be more useful outside the init scripts as well. I already felt the itch for that during
testing where it would have been handy to take the sasl guff out of the local qpidd.conf but
connect to a different host. I will bite the bullet and Do It Right before I commit. Damn
code reviews.


- Alan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/20625/#review41287
-----------------------------------------------------------


On April 23, 2014, 7:30 p.m., Alan Conway wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/20625/
> -----------------------------------------------------------
> 
> (Updated April 23, 2014, 7:30 p.m.)
> 
> 
> Review request for qpid, Gordon Sim and Kenneth Giusti.
> 
> 
> Bugs: QPID-5719
>     https://issues.apache.org/jira/browse/QPID-5719
> 
> 
> Repository: qpid
> 
> 
> Description
> -------
> 
> QPID-5719: HA becomes unresponsive once any of the brokers are SIGSTOPed
> 
> - Added timeout to qpid-ha.
> - qpidd init script pings broker to verify it is not hung.
> - updated documentation in qpid/doc/book/src/cpp-broker/Active-Passive-Cluster.xml.
> 
> The new results for the cases mentioned in the bug:
> 
> a] stopped ALL brokers: rgmanager restarts the entire cluster but data is lost.
>    Equivalent to killing all the  brokers at once. This does not affect quorum because
>    only qpidd services are affected, not other services managed by cman.
> 
> b] stopped the primary: rgmanager restarts the primary after a timeout and promotes one
of the backups.
> 
> c] stopped a backup: rgmanager restarts the backups after a timeout.
>    Clients that are actively sending messages may see a delay while backup is restarted.
> 
> Note you need to set link-heartbeat-interval in qpidd.conf. The default is very
> high (120 seconds), it should be set lower to see recovery from sigstop in a
> reasonable time.
> See the updated documentation in qpid/doc/book/src/cpp-broker/Active-Passive-Cluster.xml.
> 
> 
> Diffs
> -----
> 
>   /trunk/qpid/cpp/etc/qpidd-primary.in 1589403 
>   /trunk/qpid/cpp/etc/qpidd.in 1589403 
>   /trunk/qpid/cpp/src/tests/ha_test.py 1589403 
>   /trunk/qpid/cpp/src/tests/ha_tests.py 1589403 
>   /trunk/qpid/doc/book/src/cpp-broker/Active-Passive-Cluster.xml 1589403 
>   /trunk/qpid/tools/src/py/qpid-ha 1589403 
> 
> Diff: https://reviews.apache.org/r/20625/diff/
> 
> 
> Testing
> -------
> 
> Tested with 3 node cman cluster, passes full ctest.
> 
> 
> Thanks,
> 
> Alan Conway
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message