qpid-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gordon Sim <g...@redhat.com>
Subject Re: Qpid post-mortem and request for suggestions for (my) next release challenge (10M msgs/sec on Windows)
Date Tue, 18 Jun 2013 10:10:09 GMT
On 06/17/2013 07:22 PM, Kerry Bonin wrote:
> On Mon, Jun 17, 2013 at 7:26 AM, Gordon Sim <gsim@redhat.com> wrote:
>> On 06/14/2013 03:58 PM, Kerry Bonin wrote:
>>>   - to prevent network splits, how are recovered brokers monitored?  When a
>>> failed broker recovers, do clients switch back?  How often / aggressively
>>> checked?
>> No, there is no switch back behaviour in the client. The new HA code
>> allows a broker to be classed as in a backup or primary role and backups
>> will reject or kick off any clients causing them to failover. Whatever
>> cluster management solution was in use would then detect changes to primary
>> and use QMF to tell each broker what their role was.
> I'd like to suggest that this is a serious deficiency.  It would be nice if
> it was possible to have some HA features without having to deploy
> clustering.

Just to be clear, what I'm referring to above does not involve the old 
clustering solution, tightly bound to corosync.

The new HA has no external dependencies. It does however leave the task 
of managing the cluster to some external system (rgmanager, pacemaker 
etc), which would be responsible for deciding who is the primary, 
detecting failure, electing a new primary, handling restart and failback 

The broker simply provides the hooks for the cluster management solution 
to notify each broker in the cluster of their role.

It does rely on federation, but once that issue is resolved it should 
work on windows as well.

Though I haven't actually tried, I suspect it may be possible to simply 
use the QMF exposed hooks without needing any replication. (Certainly I 
would expect it would not require a great deal of modification to get 
that to work).

> While the lack of clustering for Windows makes this an obvious
> problem for Windows users, I'd certainly argue that *nix users might also
> like to have failover and recovery without clustering.  And without
> clustering, failover without recovery is kind of useless as a HA feature
> due to the split use case.  (i.e. 2 clients talking through broker A,
> broker A fails and 2 clients failover to broker B.  Broker A comes back
> online.  Another client joins, connects to broker A.  We now have a split,
> new client cannot see old clients.)

>>>   - how is the application notified on broker failure, connection failover,
>>> recovery?
>> It isn't. Any threads using the connection will essentially block until
>> either the connection was re-established or until the configured limit was
>> reached and the client gives up trying.
>> Now I write this I do recall a conversation on this topic with you some
>> time back, with this being an issue for you.
> I'd like to suggest that this remains a serious deficiency.  In most
> software, if a critical failure occurs down in middleware or its supporting
> infrastructure, it would be nice if the middleware library could report
> this to the application, so a system administrator could do something about
> it.  While its certainly possible to rely on external monitoring systems to
> notify an admin, its also a good practice to have an application display
> some sort of error condition.  A broker failure in an ESB SOA application
> is a critical failure, and the application needs to inform its user that it
> has lost connectivity to the system.

I agree and I would like to fix that deficiency. I'm going to be working 
on reconnect/replay again in conjunction with AMQP 1.0 and will see if I 
can come up with a solution then. I have created a JIRA: 

To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org

View raw message