qpid-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF subversion and git services (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (QPID-6972) BDB HA: Node may remain detached from group following loss of quorum
Date Thu, 14 Jan 2016 14:05:39 GMT

    [ https://issues.apache.org/jira/browse/QPID-6972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098130#comment-15098130

ASF subversion and git services commented on QPID-6972:

Commit 1724616 from orudyy@apache.org in branch 'java/branches/6.0.x'
[ https://svn.apache.org/r1724616 ]

QPID-6972: Delegate exception handling decisions on flushLog failures to EnvironmentFacade

           merged from trunk
           svn merge -c 1724582 https://svn.apache.org/repos/asf/qpid/java/trunk

> BDB HA: Node may remain detached from group following loss of quorum
> --------------------------------------------------------------------
>                 Key: QPID-6972
>                 URL: https://issues.apache.org/jira/browse/QPID-6972
>             Project: Qpid
>          Issue Type: Bug
>          Components: Java Broker
>    Affects Versions: 0.30, 0.32, qpid-java-6.0
>            Reporter: Keith Wall
>              Labels: bdbstore, high-availability
> If a master detects that it has lost quorum (which may occur owing to a user generated
transaction, or an internally generated 'ping' transaction, failing to see the required number
of replica acknowledgements), the underlying JE environment {{ReplicatedEnvironment}} is automatically
restarted (the old one closed and a new one created to replace it).   This approach ensures
that clients reconnect to a new master in a timely way.
> There is a coding error in the CoalescingCommitter that means that the JE environment
restart may not complete properly.  If quorum disappears whilst there are jobs on the CoalescingCommitter's
job queue, the  CoalescingCommitter's error handling will cause the BDB EnvironmentFacade
to be closed.   This is okay for the BDB non-HA case as such an exception is always fatal,
but for HA, calling {{ReplicatedEnvironmentFacade#close()}} prevents the environment from
being recreated.
> This effect of this defect is that a node may disappear from the group every time quorum
is temporarily lost.  This will keep occuring until quorum no longer remains, at which point
the business will stop.  Bouncing the affected brokers (or restarting the VHNs) will restore
the service, without message loss.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org

View raw message