qpid-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gordon Sim <g...@redhat.com>
Subject Re: Qpid post-mortem and request for suggestions for (my) next release challenge (10M msgs/sec on Windows)
Date Mon, 17 Jun 2013 12:26:13 GMT
On 06/14/2013 03:58 PM, Kerry Bonin wrote:
> On existing broker failover - can you point me to where that behavior is
> documented?  Because neither myself or anyone on the four teams I work with
> has come across the functionality you describe.  I've never seen a client
> failover to another broker, only code to attempt to reconnect.

It appears the reconnect_urls connection option is not in fact 
documented. Sorry about that. It takes a single url or a Variant::List 
of urls to try when reconnecting.

>  Basic
> features we need:
> - externally adjustable retry / timeout on connections - to handle
> differences between LAN, WAN, and satellite internet.
> - updating broker list: How do you do this?  Never seen it...

There are two options. The first is that any url in the AMQP 0-10 format 
can itself contain multiple hosts, e.g. 
amqp:tcp:host1:port1,host2:port2. The second is to use the 
reconnect_urls option as above.

(When used in conjunction with the failover exchange there is a helper 
class that will receive updates and apply them: 
something similar could be done for some other distribution mechanism).

> - to prevent network splits, how are recovered brokers monitored?  When a
> failed broker recovers, do clients switch back?  How often / aggressively
> checked?

No, there is no switch back behaviour in the client. The new HA code 
allows a broker to be classed as in a backup or primary role and backups 
will reject or kick off any clients causing them to failover. Whatever 
cluster management solution was in use would then detect changes to 
primary and use QMF to tell each broker what their role was.

> - how is the application notified on broker failure, connection failover,
> recovery?

It isn't. Any threads using the connection will essentially block until 
either the connection was re-established or until the configured limit 
was reached and the client gives up trying.

Now I write this I do recall a conversation on this topic with you some 
time back, with this being an issue for you.

> Finally, we were ending up with LOTS of application complexity in SOA code
> when broker failure / recovery meant connection, sender and receiver
> objects had to be recreated.  This was compounded by Connection being a
> different types of boost object than senders and receivers.

That's strange. Those classes all use the Handle template so I can't see 
how they would be different in that regard. I don't suppose you recall 
the details?

> And anything you can think of for dynamically load balancing across brokers?

Honestly, I think the simplest solution overall is for us to get 
federation working on windows. I assume its some issue in the IO layer. 
Does anyone have a concrete understanding of what the problem is and 
what is required to fix it?

Any volunteers from our windows experts to take a look (Cliff, Chuck, 
Andrew, Steve)?

> Greatly appreciate the feedback and input...


To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org

View raw message