qpid-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Keith Wall (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (QPID-6997) [Java Broker, BDBStore] HA JE Transaction commit might end up in NullPointerException when commit is invoked when majority is lost
Date Mon, 18 Jan 2016 09:27:39 GMT

    [ https://issues.apache.org/jira/browse/QPID-6997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15104998#comment-15104998
] 

Keith Wall commented on QPID-6997:
----------------------------------

I think rather than aborting transactions, it would be preferable if we organised for the
restart of the environment to first ensure no new connections can be formed, then close existing
connections/pause the housekeeper and only then restart the environment.  That way there should
be no transactions in flight.  I think this will be a cleaner solution.  I don't like the
idea of trying to abort transactions whilst connections are still using them.  

> [Java Broker, BDBStore] HA JE Transaction commit might end up in NullPointerException
when commit is invoked when majority is lost
> ----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: QPID-6997
>                 URL: https://issues.apache.org/jira/browse/QPID-6997
>             Project: Qpid
>          Issue Type: Bug
>          Components: Java Broker
>    Affects Versions: qpid-java-6.0
>            Reporter: Alex Rudyy
>         Attachments: TEST-org.apache.qpid.server.store.berkeleydb.replication.MultiNodeTest.testTransferMasterWhilstTransactionsInFlight.txt.NPE
>
>
> On running some tests for BDB HA  with default settings (ReplicaAckPolicy=SIMPLE_MAJORITY)
involving publishing messages in parallel from multiple connections and disconnecting replica
nodes in the middle of publishing transactions, in some rare circumstances the publishing
transaction JE commit failed with NPE thrown from JE code as below:
> {noformat}
> ########################################################################
> #
> # Unhandled Exception java.lang.NullPointerException in Thread IO-/127.0.0.1:35408
> #
> # Exiting
> #
> ########################################################################
> java.lang.NullPointerException
>  at com.sleepycat.je.rep.stream.FeederTxns.awaitReplicaAcks(FeederTxns.java:130)
>  at com.sleepycat.je.rep.impl.RepImpl.postLogCommitHook(RepImpl.java:1130)
>  at com.sleepycat.je.rep.txn.MasterTxn.postLogCommitHook(MasterTxn.java:191)
>  at com.sleepycat.je.txn.Txn.commit(Txn.java:756)
>  at com.sleepycat.je.Transaction.doCommit(Transaction.java:609)
>  at com.sleepycat.je.Transaction.commit(Transaction.java:410)
>  at org.apache.qpid.server.store.berkeleydb.replication.ReplicatedEnvironmentFacade.commitAsync(ReplicatedEnvironmentFacade.java:302)
>  at org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore.commitTranAsyncImpl(AbstractBDBMessageStore.java:794)
>  at org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore.access$1200(AbstractBDBMessageStore.java:74)
>  at org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore$BDBTransaction.commitTranAsync(AbstractBDBMessageStore.java:1364)
>  at org.apache.qpid.server.txn.LocalTransaction.commitAsync(LocalTransaction.java:399)
>  at org.apache.qpid.server.protocol.v0_8.AMQChannel.commit(AMQChannel.java:1220)
>  at org.apache.qpid.server.protocol.v0_8.AMQChannel.receiveTxCommit(AMQChannel.java:3622)
>  at org.apache.qpid.codec.ServerDecoder.processMethod(ServerDecoder.java:228)
>  at org.apache.qpid.codec.AMQDecoder.processFrame(AMQDecoder.java:191)
>  at org.apache.qpid.server.protocol.v0_8.BrokerDecoder.doProcessFrame(BrokerDecoder.java:114)
>  at org.apache.qpid.server.protocol.v0_8.BrokerDecoder.access$000(BrokerDecoder.java:36)
>  at org.apache.qpid.server.protocol.v0_8.BrokerDecoder$1.run(BrokerDecoder.java:78)
>  at org.apache.qpid.server.protocol.v0_8.BrokerDecoder$1.run(BrokerDecoder.java:74)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at org.apache.qpid.server.protocol.v0_8.BrokerDecoder.processFrame(BrokerDecoder.java:73)
>  at org.apache.qpid.codec.AMQDecoder.processInput(AMQDecoder.java:173)
>  at org.apache.qpid.codec.AMQDecoder.decode(AMQDecoder.java:114)
>  at org.apache.qpid.codec.ServerDecoder.decodeBuffer(ServerDecoder.java:43)
>  at org.apache.qpid.server.protocol.v0_8.AMQPConnection_0_8$1.run(AMQPConnection_0_8.java:266)
>  at org.apache.qpid.server.protocol.v0_8.AMQPConnection_0_8$1.run(AMQPConnection_0_8.java:258)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at org.apache.qpid.server.protocol.v0_8.AMQPConnection_0_8.received(AMQPConnection_0_8.java:257)
>  at org.apache.qpid.server.transport.MultiVersionProtocolEngine.received(MultiVersionProtocolEngine.java:142)
>  at org.apache.qpid.server.transport.NonBlockingConnection.processAmqpData(NonBlockingConnection.java:547)
>  at org.apache.qpid.server.transport.NonBlockingConnectionPlainDelegate.processData(NonBlockingConnectionPlainDelegate.java:58)
>  at org.apache.qpid.server.transport.NonBlockingConnection.doRead(NonBlockingConnection.java:446)
>  at org.apache.qpid.server.transport.NonBlockingConnection.doWork(NonBlockingConnection.java:253)
>  at org.apache.qpid.server.transport.NetworkConnectionScheduler.processConnection(NetworkConnectionScheduler.java:108)
>  at org.apache.qpid.server.transport.SelectorThread$ConnectionProcessor.processConnection(SelectorThread.java:499)
>  at org.apache.qpid.server.transport.SelectorThread$SelectionTask.performSelect(SelectorThread.java:337)
>  at org.apache.qpid.server.transport.SelectorThread$SelectionTask.run(SelectorThread.java:86)
>  at org.apache.qpid.server.transport.SelectorThread.run(SelectorThread.java:457)
>  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  at java.lang.Thread.run(Thread.java:745)
> {noformat}
> It seems that majority was lost in the middle of transaction. Qpid HA logic detected
that and restarted the environment, however, the transaction was not aborted.
> On closing of the environment the following has been reported:
> {noformat}
> Problem: 3 locks left
> ---- LSN: 0x0/0x2d80----
>  ThinLockAddr:1794979774 Owner:1919034342 -50_Selector-virtualhost-test-iopool_MasterTxn
Waiters: (none)
> ---- LSN: 0x0/0x1ed8----
>  ThinLockAddr:374788562 Owner:1919034342 -50_Selector-virtualhost-test-iopool_MasterTxn
Waiters: (none)
> ---- LSN: 0x0/0x20db----
>  ThinLockAddr:501824883 Owner:1919034342 -50_Selector-virtualhost-test-iopool_MasterTxn
Waiters: (none)
> [BROKER-1] o.a.q.t.u.SpawnedBrokerHolder
> Problem: 1 txns left
> txnId = -50 txnName = 1919034342 -50_Selector-virtualhost-test-iopool_MasterTxn
> Local Cache Usage = 240
> Cache Layout: Allocation of resources in the cache.
>  adminBytes=0
>  cacheTotalBytes=1,622,051
>  dataAdminBytes=0
>  dataBytes=0
>  lockBytes=240
>  sharedCacheTotalBytes=0
> WARN  [Environment-test:nodetestTransferMasterWhilstTransactionsInFlight10001] o.a.q.s.s.b.r.ReplicatedEnvironmentFacade
Ignoring an exception whilst closing environment
> com.sleepycat.je.EnvironmentFailureException: (JE 5.0.104) nodetestTransferMasterWhilstTransactionsInFlight10001(1):/tmp/qpid-work-org.apache.qpid.server.store.berkeleydb.replication.MultiNodeTest.testTransferMasterWhilstTransactionsInFlight-1-8967653656689695718/test/config
Problem closing handle nodetestTransferMasterWhilstTransactionsInFlight10001(1) UNEXPECTED_EXCEPTION:
Unexpected internal Exception, may have side effects. Environment is invalid and must be closed.
>         at com.sleepycat.je.rep.ReplicatedEnvironment.close(ReplicatedEnvironment.java:709)
~[je-5.0.104.jar:5.0.104]
>         at org.apache.qpid.server.store.berkeleydb.replication.ReplicatedEnvironmentFacade.closeEnvironmentOnRestart(ReplicatedEnvironmentFacade.java:1277)
[qpid-bdbstore-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT]
>         at org.apache.qpid.server.store.berkeleydb.replication.ReplicatedEnvironmentFacade.restartEnvironment(ReplicatedEnvironmentFacade.java:1253)
[qpid-bdbstore-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT]
>         at org.apache.qpid.server.store.berkeleydb.replication.ReplicatedEnvironmentFacade.access$500(ReplicatedEnvironmentFacade.java:92)
[qpid-bdbstore-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT]
>         at org.apache.qpid.server.store.berkeleydb.replication.ReplicatedEnvironmentFacade$3.run(ReplicatedEnvironmentFacade.java:481)
[qpid-bdbstore-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT]
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
[na:1.7.0_80]
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
[na:1.7.0_80]
>         at java.lang.Thread.run(Thread.java:745) [na:1.7.0_80]
> Caused by: java.lang.IllegalStateException: Unclosed Database: MESSAGE_CONTENT
> Unclosed Database: MESSAGE_CONTENT
> Databases left open: 2
> There is 1 existing transaction opened against the Environment.
> Aborting open transactions ...
> aborting <Transaction id="-46">
>         at com.sleepycat.je.Environment.close(Environment.java:390) ~[je-5.0.104.jar:5.0.104]
>         at com.sleepycat.je.rep.ReplicatedEnvironment.close(ReplicatedEnvironment.java:702)
~[je-5.0.104.jar:5.0.104]
>         ... 7 common frames omitted
> {noformat}
> It appears we need to abort all transactions whilst restarting the environment



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org


Mime
View raw message