qpid-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Rudyy (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (QPID-7192) [Java Broker] BDB HA Virtual Host Node does not restart successfully if JE environment has locally committed transactions requiring rollback
Date Mon, 11 Apr 2016 11:49:25 GMT

     [ https://issues.apache.org/jira/browse/QPID-7192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Alex Rudyy updated QPID-7192:
-----------------------------
    Attachment:     (was: QPID-7192-wip.diff)

> [Java Broker] BDB HA Virtual Host Node does not restart successfully if JE environment
has locally committed transactions requiring rollback
> --------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: QPID-7192
>                 URL: https://issues.apache.org/jira/browse/QPID-7192
>             Project: Qpid
>          Issue Type: Bug
>          Components: Java Broker
>    Affects Versions: qpid-java-6.0, qpid-java-6.1
>            Reporter: Alex Rudyy
>         Attachments: QPID-7192-wip.diff
>
>
> Start-up of BDB HA VHN fails in the following scenario:
> 1) When Transaction on Master node is in process of commit  and the replicas BDB HA VHNs
are stopped at the same time, the transaction is aborted with InsufficientReplicasException
> 2) BDB HA VHN for replicas are restarted and the Replica environment detects a transaction
requiring rollback from previous commit. The rollback transaction causes restart of JE Environment.
> 3) BDB HA VHN for impacted replica continues activation and performs an intruder protection
checks but Environment is restarting which results in exception "ConnectionScopedRuntimeException:
Environment is restarting". The exception puts BDB HA VHN into ERRORED state. JE environment
re-join the group successfully but  BDB HA VHN is not aware about it, as StateChangeListener
is not set and BDB HA VHN does not receive notifications about state transitions
> A manual operator intervention is required to recover from the issue: BDB HA VHN needs
to be started up again by invoking state change operation from Web Management Console or REST
API.
>  
> Here are JE Exceptions reported on Replica side on transaction rollback:
> {noformat}
> BROKER-20 WARN  [DETACHED nodetestInFlightTransactionsWhilstMajorityIsLost10004(2)] o.a.q.s.s.b.r.ReplicatedEnvironmentFacade
test:nodetestInFlightTransactionsWhilstMajorityIsLost10004 has transaction(s) ahead of the
current master. These must be discarded to allow this node to rejoin the group. This condition
is normally caused by the use of weak durability options.
> BROKER-20 DEBUG [DETACHED nodetestInFlightTransactionsWhilstMajorityIsLost10004(2)] o.a.q.s.s.b.r.ReplicatedEnvironmentFacade
Environment restarting due to exception (JE 5.0.104) nodetestInFlightTransactionsWhilstMajorityIsLost10004(2):/tmp/qpid-work-org.apache.qpid.server.store.berkeleydb.replication.MultiNodeTest.testInFlightTransactionsWhilstMajorityIsLost-20-6184300293922905690/test/config
Node nodetestInFlightTransactionsWhilstMajorityIsLost10004(2):/tmp/qpid-work-org.apache.qpid.server.store.berkeleydb.replication.MultiNodeTest.testInFlightTransactionsWhilstMajorityIsLost-20-6184300293922905690/test/config
must rollback 3 commits to the earliest point indicated by transaction id=-142 time=2016-04-08
00:16:16.384 vlsn=356 lsn=0x0/0x8332 in order to rejoin the replication group. All existing
ReplicatedEnvironment handles must be closed and reinstantiated.  Log files were truncated
to file 0x0, offset 0x33469, vlsn 353 HARD_RECOVERY: Rolled back past transaction commit or
abort. Must run recovery by re-opening Environment handles Environment is invalid and must
be closed.
> com.sleepycat.je.rep.RollbackException: (JE 5.0.104) nodetestInFlightTransactionsWhilstMajorityIsLost10004(2):/tmp/qpid-work-org.apache.qpid.server.store.berkeleydb.replication.MultiNodeTest.testInFlightTransactionsWhilstMajorityIsLost-20-6184300293922905690/test/config
Node nodetestInFlightTransactionsWhilstMajorityIsLost10004(2):/tmp/qpid-work-org.apache.qpid.server.store.berkeleydb.replication.MultiNodeTest.testInFlightTransactionsWhilstMajorityIsLost-20-6184300293922905690/test/config
must rollback 3 commits to the earliest point indicated by transaction id=-142 time=2016-04-08
00:16:16.384 vlsn=356 lsn=0x0/0x8332 in order to rejoin the replication group. All existing
ReplicatedEnvironment handles must be closed and reinstantiated.  Log files were truncated
to file 0x0, offset 0x33469, vlsn 353 HARD_RECOVERY: Rolled back past transaction commit or
abort. Must run recovery by re-opening Environment handles Environment is invalid and must
be closed.
>         at com.sleepycat.je.rep.stream.ReplicaFeederSyncup.setupHardRecovery(ReplicaFeederSyncup.java:650)
~[je-5.0.104.jar:5.0.104]
>         at com.sleepycat.je.rep.stream.ReplicaFeederSyncup.verifyRollback(ReplicaFeederSyncup.java:341)
~[je-5.0.104.jar:5.0.104]
>         at com.sleepycat.je.rep.stream.ReplicaFeederSyncup.execute(ReplicaFeederSyncup.java:148)
~[je-5.0.104.jar:5.0.104]
>         at com.sleepycat.je.rep.impl.node.Replica.initReplicaLoop(Replica.java:605) ~[je-5.0.104.jar:5.0.104]
>         at com.sleepycat.je.rep.impl.node.Replica.runReplicaLoopInternal(Replica.java:396)
~[je-5.0.104.jar:5.0.104]
>         at com.sleepycat.je.rep.impl.node.Replica.runReplicaLoop(Replica.java:332) ~[je-5.0.104.jar:5.0.104]
>         at com.sleepycat.je.rep.impl.node.RepNode.run(RepNode.java:1506) ~[je-5.0.104.jar:5.0.104]
> {noformat}
> The above causes the below
> {noformat}
> BROKER-20 DEBUG [HttpManagement-http-74] o.a.q.s.m.p.f.ExceptionHandlingFilter Exception
in servlet '/api/latest/virtualhostnode/nodetestInFlightTransactionsWhilstMajorityIsLost10004':
> org.apache.qpid.server.util.ConnectionScopedRuntimeException: Environment is restarting
>         at org.apache.qpid.server.store.berkeleydb.replication.ReplicatedEnvironmentFacade.getEnvironment(ReplicatedEnvironmentFacade.java:1342)
~[qpid-bdbstore-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT]
>         at org.apache.qpid.server.store.berkeleydb.replication.ReplicatedEnvironmentFacade.getNodes(ReplicatedEnvironmentFacade.java:965)
~[qpid-bdbstore-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT]
>         at org.apache.qpid.server.virtualhostnode.berkeleydb.BDBHAVirtualHostNodeImpl.activate(BDBHAVirtualHostNodeImpl.java:361)
~[qpid-bdbstore-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT]
>         at org.apache.qpid.server.virtualhostnode.AbstractVirtualHostNode.doActivate(AbstractVirtualHostNode.java:160)
~[qpid-broker-core-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT]
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.7.0_80]
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
~[na:1.7.0_80]
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
~[na:1.7.0_80]
>         at java.lang.reflect.Method.invoke(Method.java:606) ~[na:1.7.0_80]
>         at org.apache.qpid.server.model.AbstractConfiguredObject.attainState(AbstractConfiguredObject.java:1308)
~[qpid-broker-core-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT]
>         at org.apache.qpid.server.model.AbstractConfiguredObject.attainState(AbstractConfiguredObject.java:1287)
~[qpid-broker-core-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT]
>         at org.apache.qpid.server.model.AbstractConfiguredObject.attainStateIfOpenedOrReopenFailed(AbstractConfiguredObject.java:1271)
~[qpid-broker-core-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT]
>         at org.apache.qpid.server.model.AbstractConfiguredObject.access$1700(AbstractConfiguredObject.java:80)
~[qpid-broker-core-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT]
>         at org.apache.qpid.server.model.AbstractConfiguredObject$15.execute(AbstractConfiguredObject.java:1519)
~[qpid-broker-core-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT]
>         at org.apache.qpid.server.model.AbstractConfiguredObject$15.execute(AbstractConfiguredObject.java:1455)
~[qpid-broker-core-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT]
>         at org.apache.qpid.server.model.AbstractConfiguredObject$2.execute(AbstractConfiguredObject.java:561)
~[qpid-broker-core-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT]
>         at org.apache.qpid.server.model.AbstractConfiguredObject$2.execute(AbstractConfiguredObject.java:554)
~[qpid-broker-core-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT]
>         at org.apache.qpid.server.configuration.updater.TaskExecutorImpl$TaskLoggingWrapper.execute(TaskExecutorImpl.java:270)
~[qpid-broker-core-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT]
>         at org.apache.qpid.server.configuration.updater.TaskExecutorImpl$CallableWrapper$1.run(TaskExecutorImpl.java:342)
~[qpid-broker-core-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT]
>         at java.security.AccessController.doPrivileged(Native Method) ~[na:1.7.0_80]
>         at javax.security.auth.Subject.doAs(Subject.java:356) ~[na:1.7.0_80]
>         at org.apache.qpid.server.configuration.updater.TaskExecutorImpl$CallableWrapper.call(TaskExecutorImpl.java:335)
~[qpid-broker-core-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT]
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[na:1.7.0_80]
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
~[na:1.7.0_80]
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
~[na:1.7.0_80]
>         at java.lang.Thread.run(Thread.java:745) [na:1.7.0_80]
> {noformat}
> Test MultiNodeTest.testInFlightTransactionsWhilstMajorityIsLost sporadically fails because
of this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org


Mime
View raw message