qpid-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rajith Attapattu <rajit...@gmail.com>
Subject Re: Failover
Date Mon, 19 Sep 2011 16:24:32 GMT
Thanks for posting the write up.
Comments inline.

On Fri, Sep 16, 2011 at 12:26 PM, Oleksandr Rudyy <orudyy@gmail.com> wrote:
> Hi all,
> Me and Robbie created the following draft of Failover Policy for Qpid
> Java Client.
> Could you please comment on it?
> Qpid Java Client Failover Policy
> 1. Qpid client failover basic principles.
> ===================================================
> When connection to broker is lost a Qpid client should be able to
> re-establish connection to broker if failover policy is not switched
> off by specifying "nofailover" as a failover option in a connection
> URL.

> The failover functionality on Qpid client should be based on principle
> "stop the world".  When connection is lost and failover starts the
> Qpid Client should not allow an invocation of any JMS operation which
> requires sending or receiving data over the network. Such operations
> should be blocked until failover functionality restores the
> connectivity by using any of the supported failover methods
> ('singlebroker', 'roundrobin', 'failover_exchange').

I also think it would be useful if we could provide a brief write up
about the different failover methods we supports and exactly what can
be expected.
We also need to clearly clarify things like reconnect_timeout,
reconnect_limit and reconnect_interval etc and should align ourselves
with the C++ and Python (Ruby...etc)  in terms of behaviour of these

> On restoring connectivity blocked JMS operations should be allowed to
> finish. If the failover functionality cannot re-establish the
> connection a JMSException should be thrown within any JMS operation
> requiring transferring data over the network.

We need to come up with a clear strategy of notifying exceptions, an
issue where we are falling short in the current implementation.
If a connection listener is used then we have to notify via the
connection listener as per the spec.

Then comes the question of what are we going to do with a JMS
operation that transfers data over the network.
Are we still going to throw an exception as mentioned above? or are we
going to throw an exception IFF there is no exception listener ?

Next up, If we are throwing a JMS exception, how will the application
know the difference between a session exception and a connection
exception ?
Ex.  Resource limit exceeded vs Connection issue ?

In the current code, this is an area which is causing deadlocks and
race conditions. This also an area that is causing confusion among our
users as they are not sure how exactly the client is going to behave.
Therefore IMO I believe it's imperative to fix our exception handling
if we are to provide our users with deterministic failover experience.

> On successful failover, it is expected that client JMS session should
> restore all temporary queues(topics) if such were created before
> failover.

This is not always the case and depends on what reliability gauntness
are specified on a per destination basis via the "reliability" option.
If "unreliable" or "at-most-once" is used, then I believe then the
above is fine.

If "at-least once" is used then we cannot simply create a new
temporary queue, as btw the time the client failed over and
reconnected, there may have been messages sent to the previous temp
If we simply create a new temp queue then those messages on the old
queue will be lost.
This is certainly the case with the C++ broker which supports
clustering and with both brokers in the event of a temporary network

> 2. Description of failover behavior for JMS operations
> ==================================================
> Session.commit() rollbacks transaction and throws a
> TransactionRolledBackException if the session is dirty (there were
> messages sent/received in the transaction) when failover occurred,
> allowing the user to replay their transaction on the new Session.

We need more clarification here.
1. Are we going to allow the same transacted session to be used just
like the way we allow non transacted JMS sessions? In other words are
we going to allow the JMS Session to resume once the exception is
2. Or do we ask the application to create a new JMS session before
continuing with the work ?

Currently on 0-10 path we do #2, but there are customer requests to support #1.

> Session.recover() and Message#acknowledge() should throw a
> JMSException if there has been any message delivery since the last
> acknowledgment was sent and failover occurred whilst the session was
> 'dirty'

Perhaps here we may have to make a distinction between,
1. Messages that were in the prefetch buffer but not yet delivered to
the application
2. Messages that were delivered to the application but not yet acked
(unacked message buffer).

AFAIK we currently mark both categories of messages are redelivered.
There has been some push back on this, asking us to mark only the
delivered but unacked messages as "redelivered".

> Message consumer:
> No further messages sent to the client by the old broker should be
> received by the consumer after failover has occurred, only messages
> sent by the new broker should be available to consumers.

Agreed, but pls see the above comment.

> Queue browser:
> If failover occurred while iterating through QueueBrowser enumerations
> a sub-class of NotSuchElementException should be thrown by default.


> 3. Issues with acknowledgments
> ==================================================
> The acknowledge operation should not return till all messages has
> actually been acknowledged. Currently is possible for messages not
> being acknowledged after invoking acknowledge operation. The
> acknowledge is done lazily by acknowledgment flusher, however, this is
> not what the JMS spec requires.

I believe we need to handle 3 cases during failover.
I have provided a detailed analysis of the above in QPID-3462

> The JMS requires that after each call to receive operation or
> completion of each onMessage the received message should be
> acknowledged. Currently this does not happens as the acknowledge is
> done lazily by acknowledgment flusher which does not give the same
> guarantee. This is in fact is DUPS_OK_ACKNOWLEDGE behavior. The
> flusher thread should not be running for AUTO_ACKNOWLEDGE.

In order to be spec compliant applications need to use -Dsync_ack=true
which will force the client to sync with the broker after each
onMessage or receive call.
However this is terribly slow, hence the reason why it behaves like DUPS_OK.

I don't necessarily condone this behaviour (neither did I code this
part),  but making it spec compliant might suddenly make a lot of
applications crawl :D
So we need to be very careful here.  If we do we need to make a lot of
noise about this in release notes etc...

The ack-flusher was done to periodically flush message completions
during periods of relatively slow message flows.
If we fix AUTO_ACK then I agree this does not need to run during AUTO_ACK.

> Flusher thread is started but it is not really needed because
> acknowledgment for transacted sessions is handled differently.
> It seems that a flusher thread make sense to run in a

This is actually a bug. As you say we only need to run it during
DUPS_OK and AUTO_ACK (until auto-ack gets fixed).

> The flusher thread should not be running for NO_ACKNOWLEDGE.

We need to clearly define how the JMS ack modes will work along side
the "reliability" option in address strings.

> The same issue as with AUTO_ACKNOWLEDGE.
> Is anybody using this mode? Does it make sense to keep it?

I don't know if anybody is using it. I wonder whats the use case behind this.

> Kind Regards,
> Alex
> ---------------------------------------------------------------------
> Apache Qpid - AMQP Messaging Implementation
> Project:      http://qpid.apache.org
> Use/Interact: mailto:dev-subscribe@qpid.apache.org

Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:dev-subscribe@qpid.apache.org

View raw message