qpid-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alan Conway" <acon...@redhat.com>
Subject Review Request 14213: QPID-5139: HA transactions block a thread, can deadlock the broker
Date Wed, 18 Sep 2013 19:24:21 GMT

This is an automatically generated e-mail. To reply, visit:

Review request for qpid, Andrew Stitcher, Gordon Sim, and Steve Huston.

Repository: qpid


QPID-5139: HA transactions block a thread, can deadlock the broker

PrimaryTxObserver::prepare blocks pending responses from each backup. With
concurrent transactions this can deadlock the broker: once all worker threads
are blocked in prepare, responses from backups cannot be received.

The solution is as follows:
- before blocking in prepare, start a new worker thread.
- after blocking in prepare, stop a worker thread.

This ensures that there are always more worker threads than pending
transactions, and also that we do not grow the worker thread pool by more than
number of concurrent transactions.

An alternative solution would be to make the prepare complete asynchronously.  I
believe this approach would be more complex, more risky and would be specific to
the 0-10 protocol.

TODO: implement for windows and other pollers. Any hints much appreciated!!


  /trunk/qpid/cpp/src/CMakeLists.txt 1524063 
  /trunk/qpid/cpp/src/qpid/broker/Broker.h 1524063 
  /trunk/qpid/cpp/src/qpid/broker/Broker.cpp 1524063 
  /trunk/qpid/cpp/src/qpid/ha/PrimaryTxObserver.cpp 1524063 
  /trunk/qpid/cpp/src/qpid/sys/Poller.h 1524063 
  /trunk/qpid/cpp/src/qpid/sys/PollerThreads.h PRE-CREATION 
  /trunk/qpid/cpp/src/qpid/sys/PollerThreads.cpp PRE-CREATION 
  /trunk/qpid/cpp/src/qpid/sys/epoll/EpollPoller.cpp 1524063 
  /trunk/qpid/cpp/src/tests/ha_test.py 1524063 
  /trunk/qpid/cpp/src/tests/ha_tests.py 1524063 
  /trunk/qpid/cpp/src/tests/test_store.cpp 1524063 

Diff: https://reviews.apache.org/r/14213/diff/


New ha_tests.py unit test, starts broker with 2 threads and runs 10 concurrent transactions.
Fails reliably before the fix is applied. Passed > 300 iteration after the fix.

Full ctest passes.


Alan Conway

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message