qpid-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Keith Wall (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (QPID-7078) [Java Broker,HA] BDB HA VHN in master role designated as primary can sporadically transit into unknown role after loosing second replica node
Date Tue, 01 Mar 2016 14:47:18 GMT

    [ https://issues.apache.org/jira/browse/QPID-7078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173844#comment-15173844
] 

Keith Wall commented on QPID-7078:
----------------------------------

This appears to be a JE defect.  If another occurrence is seen, we will see if we can change
Qpid to work around.

> [Java Broker,HA] BDB HA VHN in master role designated as primary can sporadically transit
into unknown role after loosing second replica node
> ---------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: QPID-7078
>                 URL: https://issues.apache.org/jira/browse/QPID-7078
>             Project: Qpid
>          Issue Type: Bug
>          Components: Java Broker
>    Affects Versions: 0.32, qpid-java-6.0, qpid-java-6.0.1, qpid-java-6.1
>            Reporter: Alex Rudyy
>         Attachments: TEST-org.apache.qpid.server.store.berkeleydb.replication.TwoNodeTest.testDesignatedPrimaryContinuesAfterSecondaryStopped.txt
>
>
> Failure of test TwoNodeTest#testDesignatedPrimaryContinuesAfterSecondaryStopped reviled
an unexpected behavior of  BDB JE when master node designated as primary suddenly transits
into UNKNOWN role after shutting down of second replica node.
> The test failed as below:
> {noformat}
> testDesignatedPrimaryContinuesAfterSecondaryStopped(org.apache.qpid.server.store.berkeleydb.replication.TwoNodeTest)
 Time elapsed: 7.236 sec  <<< ERROR!
> javax.jms.JMSException: Error registering consumer: org.apache.qpid.QpidException: Fail-over
exception interrupted basic consume.
> 	at org.apache.qpid.client.AMQSession.registerConsumer(AMQSession.java:3093)
> 	at org.apache.qpid.client.AMQSession.access$400(AMQSession.java:94)
> 	at org.apache.qpid.client.AMQSession$5.execute(AMQSession.java:2094)
> 	at org.apache.qpid.client.AMQSession$5.execute(AMQSession.java:2069)
> 	at org.apache.qpid.client.AMQConnectionDelegate_8_0.executeRetrySupport(AMQConnectionDelegate_8_0.java:416)
> 	at org.apache.qpid.client.AMQConnection.executeRetrySupport(AMQConnection.java:737)
> 	at org.apache.qpid.client.failover.FailoverRetrySupport.execute(FailoverRetrySupport.java:90)
> 	at org.apache.qpid.client.AMQSession.createConsumerImpl(AMQSession.java:2067)
> 	at org.apache.qpid.client.AMQSession.createConsumer(AMQSession.java:989)
> 	at org.apache.qpid.client.AMQConnection.retrieveVirtualHostPropertiesIfNecessary(AMQConnection.java:809)
> 	at org.apache.qpid.client.AMQConnection.createSession(AMQConnection.java:796)
> 	at org.apache.qpid.client.AMQConnection.createSession(AMQConnection.java:771)
> 	at org.apache.qpid.client.AMQConnection.createSession(AMQConnection.java:765)
> 	at org.apache.qpid.client.AMQConnection.createSession(AMQConnection.java:88)
> 	at org.apache.qpid.test.utils.QpidBrokerTestCase.assertProducingConsuming(QpidBrokerTestCase.java:1256)
> 	at org.apache.qpid.server.store.berkeleydb.replication.TwoNodeTest.testDesignatedPrimaryContinuesAfterSecondaryStopped(TwoNodeTest.java:108)
> Caused by: org.apache.qpid.client.failover.FailoverException: Failing over about to start
> 	at org.apache.qpid.client.AMQProtocolHandler.notifyFailoverStarting(AMQProtocolHandler.java:434)
> 	at org.apache.qpid.client.AMQProtocolHandler$1.run(AMQProtocolHandler.java:287)
> 	at java.lang.Thread.run(Thread.java:745)
> {noformat}
> On broker side a transition into UNKNOWN state occurred as below:
> {noformat}
> 10:15:44,279 B-10000 DEBUG [Group-Change-Learner:test:nodetestDesignatedPrimaryContinuesAfterSecondaryStopped10001]
o.a.q.s.s.b.r.DatabasePinger Ping transaction completed
> 10:15:44,279 B-10000 DEBUG [IO-/127.0.0.1:58662] o.a.q.s.p.v.BrokerDecoder Frame handled
in 1344 ms.
> 10:15:44,279 B-10000 INFO  [MASTER nodetestDesignatedPrimaryContinuesAfterSecondaryStopped10001(1)]
o.a.q.s.s.b.r.ReplicatedEnvironmentFacade The node 'test:nodetestDesignatedPrimaryContinuesAfterSecondaryStopped10001'
state is UNKNOWN
> 10:15:44,279 B-10000 DEBUG [StateChange-test:nodetestDesignatedPrimaryContinuesAfterSecondaryStopped10001]
o.a.q.s.s.b.r.ReplicatedEnvironmentFacade Received BDB event, new BDB state UNKNOWN Facade
state : OPEN
> 10:15:44,279 B-10000 INFO  [StateChange-test:nodetestDesignatedPrimaryContinuesAfterSecondaryStopped10001]
o.a.q.s.v.b.BDBHAVirtualHostNodeImpl Received BDB event indicating transition from state MASTER
to UNKNOWN for nodetestDesignatedPrimaryContinuesAfterSecondaryStopped10001
> 10:15:44,280 B-10000 DEBUG [VirtualHostNode-nodetestDesignatedPrimaryContinuesAfterSecondaryStopped10001-Config]
o.a.q.s.c.u.TaskExecutorImpl Performing Task['close' on 'BDBHAVirtualHostImpl [id=3e9eac0d-ff2e-4469-a7ed-aded200c0881,
name=test]']
> 10:15:44,281 B-10000 DEBUG [VirtualHostNode-nodetestDesignatedPrimaryContinuesAfterSecondaryStopped10001-Config]
o.a.q.s.m.AbstractConfiguredObject Closing BDBHAVirtualHostImpl : test
> 2016-02-17 10:15:44,281 B-10000 DEBUG [VirtualHostNode-nodetestDesignatedPrimaryContinuesAfterSecondaryStopped10001-Config]
o.a.q.s.v.AbstractVirtualHost Closing connection registry :1 connections.
> 10:15:44,282 B-10000 DEBUG [VirtualHostNode-nodetestDesignatedPrimaryContinuesAfterSecondaryStopped10001-Config]
o.a.q.s.c.u.TaskExecutorImpl Task['close' on 'BDBHAVirtualHostImpl [id=3e9eac0d-ff2e-4469-a7ed-aded200c0881,
name=test]'] performed successfully with result: null
> 10:15:44,283 B-10000 DEBUG [Broker-Config] o.a.q.s.c.u.TaskExecutorImpl Performing Task['close'
on '/127.0.0.1:58662(guest)']
> 10:15:44,284 B-10000 DEBUG [Broker-Config] o.a.q.s.m.AbstractConfiguredObject Closing
AMQPConnection_0_8 : [1] 127.0.0.1:58662
> 10:15:44,284 B-10000 DEBUG [Broker-Config] o.a.q.s.c.u.TaskExecutorImpl Task['close'
on '/127.0.0.1:58662(guest)'] performed successfully with result: null
> {noformat}
> The transition into UNKNOWN state should not happen as MASTER node is designated as primary.
The exhibit behavior indicates about BDB JE bug.
> It is unclear whether JE Environment can recover from this unexpected flip into UNKNOWN
state. If JE can recover, then on next transition into MASTER VHN should recover VH and connected
applications can continue as usual. If JE can not recover, then BDB HA VHN will not recover
automatically from this conditions, as we do not restart the environment on MasterUnknownException.
The operator intervention would be required to restart BDB HA VHN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org


Mime
View raw message