qpid-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alan Conway" <acon...@redhat.com>
Subject Re: Review Request 25151: QPID-5975: HA extra/missing messages when running qpid-txtest2 in a loop with failover.
Date Thu, 28 Aug 2014 21:42:40 GMT


> On Aug. 28, 2014, 3:15 p.m., Gordon Sim wrote:
> > > Unfortunately With this fix qpid-txtest2 is no longer useful test for TX
> > > failover because it regularly raises TransactionUnknown and there's not much
we
> > > can do with that.
> > 
> > I disgaree. The tool is written so that the initialisation and verification can
be run separately from the actual transactional transfer. So even as is, I think its a very
useful test (you just need to ensure that you only do failover after initialising, and then
stop failing over while verifying the state of the queues.
> > 
> > Not only that but it would of course be possible to handle the TransactionUnknown
by checking the state of the queues (via QMF, or by trying to fetch then releasing them back)
and make a decision based on that.
> > 
> > > A better test of TX atomicity with failover is to run a pair of
> > > qpid-send/qpid-receive with fail-over and verify that the number of
> > > enqueues/dequeues and message depth are a multiple of the transaction size.
> > 
> > I don't think that is as a good test of atomicity, and it certainly isn't better,
since it doesn't cover the most common case of having consumed and published messages in the
same transaction.

I agree with your disagreement. Subsequent to making that remark I did put together a useful
test exactly as you describe - check the JIRA for details. I didn't add any logic to handle
TransactionUnknown, I just let the transfer phase fail but run the verification phase anyway
since you should always have the same set of total messages regardless how many of the transactions
you manage to run.


- Alan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25151/#review51780
-----------------------------------------------------------


On Aug. 28, 2014, 2:27 p.m., Alan Conway wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/25151/
> -----------------------------------------------------------
> 
> (Updated Aug. 28, 2014, 2:27 p.m.)
> 
> 
> Review request for qpid, Gordon Sim and Robbie Gemmell.
> 
> 
> Bugs: QPID-5975
>     https://issues.apache.org/jira/browse/QPID-5975
> 
> 
> Repository: qpid
> 
> 
> Description
> -------
> 
> This is partly not-a-bug, there is a client error handling issue that has been
> corrected.
> 
> qpid-txtest2 initializes a queue with messages at the start and drains the
> queues at the end. These operations are *not transactional*. Therefore
> duplicates are expected if there is a failover during initialization or
> draining. When duplicates were observed, there was indeed a failover at one of
> these times.
> 
> Making these operations transactional is not enough to pass, now we see the test
> fail with "no messages to fetch". This is explained as follows:
> 
> If there is a failover during a transaction, TransactionAborted is raised. The
> client assumes the transaction was rolled back and re-plays it. However, if the
> failover occurs at a critical point *after* the client has sent commit
> but *before* it has received a response, then the the client *does not know*
> whether the transaction was committed or rolled-back on the new primary.
> 
> Re-playing in this case can duplicate the transaction. Each transaction moves
> messages from one queue to another so as long as transactions are atomic the
> total number of messages will not change. However, if transactions are
> duplicated, a transactional session may try to move more messages than exist on
> the queue, hence "no messages to fetch". For example if thread 1 moves N
> messages from q1 to q2, and thread 2 tries to move N+M messages back, then
> thread 2 will fail.
> 
> This problem has been corrected as follows: C++ and python clients now raise the
> following exceptions:
> 
> - TransactionAborted: The transaction has definitely been rolled back due to a
>   connection failure before commit or a broker error (e.g. a store error) during commit.
>   It can safely be replayed.
> 
> - TransactionUnknown: The transaction outcome is unknown because the connection
>   failed at the critical time. There's no simple automatic way to know what
>   happened without examining the state of the broker queues.
> 
> Unfortunately With this fix qpid-txtest2 is no longer useful test for TX
> failover because it regularly raises TransactionUnknown and there's not much we
> can do with that.
> 
> A better test of TX atomicity with failover is to run a pair of
> qpid-send/qpid-receive with fail-over and verify that the number of
> enqueues/dequeues and message depth are a multiple of the transaction size. See
> the JIRA for such a test. (Note these test also sometimes raise TransactionUnknown 
> but it doesn't matter since all we are checking is that messages go on and off the 
> queues in multiple of the TX size.)
> 
> Note: the original bug also reported seeing missing messages from
> qpid-txtest2. I don't have a good explanation for that but since the
> qpid-send/receive test shows that transactions are atomic I am going to let that
> go for now.
> 
> 
> Diffs
> -----
> 
>   trunk/qpid/cpp/bindings/qpid/python/qpid_messaging.i 1621106 
>   trunk/qpid/cpp/bindings/qpid/ruby/ruby.i 1621106 
>   trunk/qpid/cpp/include/qpid/messaging/Session.h 1621106 
>   trunk/qpid/cpp/include/qpid/messaging/exceptions.h 1621106 
>   trunk/qpid/cpp/src/libqpidmessaging-api-symbols.txt 1621106 
>   trunk/qpid/cpp/src/qpid/client/amqp0_10/SessionImpl.h 1621106 
>   trunk/qpid/cpp/src/qpid/client/amqp0_10/SessionImpl.cpp 1621106 
>   trunk/qpid/cpp/src/qpid/messaging/exceptions.cpp 1621106 
>   trunk/qpid/cpp/src/tests/ha_test.py 1621106 
>   trunk/qpid/cpp/src/tests/ha_tests.py 1621106 
>   trunk/qpid/python/qpid/messaging/driver.py 1621106 
>   trunk/qpid/python/qpid/messaging/endpoints.py 1621106 
>   trunk/qpid/python/qpid/messaging/exceptions.py 1621106 
> 
> Diff: https://reviews.apache.org/r/25151/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Alan Conway
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message