qpid-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF subversion and git services (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (QPID-6560) [Java Broker] BDB HA JE environment close on intruder detection might block the execution of VHN children tasks thus causing unecessary delays in shutdown of ReplicatedEnvironmentFacade executors
Date Fri, 29 May 2015 14:33:17 GMT

    [ https://issues.apache.org/jira/browse/QPID-6560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564889#comment-14564889
] 

ASF subversion and git services commented on QPID-6560:
-------------------------------------------------------

Commit 1682487 from orudyy@apache.org in branch 'java/trunk'
[ https://svn.apache.org/r1682487 ]

QPID-6560: Remove redundant close of ReplicationEnvironmentFacade on intruder detection as
it might block on facade thread executors shutdown caused by tasks scheduled to execute on
VHN close and HA events

> [Java Broker] BDB HA JE environment close on intruder detection might block the execution
of VHN children tasks thus causing unecessary delays in shutdown of ReplicatedEnvironmentFacade
executors
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: QPID-6560
>                 URL: https://issues.apache.org/jira/browse/QPID-6560
>             Project: Qpid
>          Issue Type: Bug
>    Affects Versions: 0.32
>            Reporter: Alex Rudyy
>            Assignee: Alex Rudyy
>             Fix For: 6.0 [Java]
>
>
> On intruder detection a task to close VHN children and set VHN state to ERRORED is scheduled
in Broker configuration thread. Immediately after scheduling the task,  ReplicatedEnvironmentFacade.close()
is invoked.
> ReplicatedEnvironmentFacade executors are shutdown in close method.
> If any of ReplicatedEnvironmentFacade executors has a pending work (tasks to run) and
that work needs to be performed in VHN configuration thread or Broker configuration thread
in synchronous manner (blocking ReplicatedEnvironmentFacade executors threads), the executors
shutdown would be blocked and eventually times out.
> Test BDBHAVirtualHostNodeRestTest.testIntruderProtection fails sporadically as indicated
by stack trace below:
> {noformat}
> junit.framework.AssertionFailedError: Attribute state did not reach expected value within
permitted timeout 5000ms. expected:<ERRORED> but was:<ACTIVE>
> 	at junit.framework.Assert.fail(Assert.java:57)
> 	at junit.framework.Assert.failNotEquals(Assert.java:329)
> 	at junit.framework.Assert.assertEquals(Assert.java:78)
> 	at junit.framework.TestCase.assertEquals(TestCase.java:244)
> 	at org.apache.qpid.systest.rest.QpidRestTestCase.waitForAttributeChanged(QpidRestTestCase.java:117)
> 	at org.apache.qpid.server.store.berkeleydb.replication.BDBHAVirtualHostNodeRestTest.testIntruderProtection(BDBHAVirtualHostNodeRestTest.java:311)
> {noformat}
> The log analysis showed that the issue occurs in the following scenario:
> * 2-node cluster is created
> * intruder node is connected
> * node1 is shutdown by intruder protection
> * node2 intruder protection is triggered and task to close VHN children  is scheduled
in Broker configuration thread.  At the same time STATE event is issued by JE on transition
from REPLICA into UNKNOWN (as majority is lost). The state change logic is invoked in the
ReplicatedEnvironmentFacade StateShange executor which in turns performs VH close in VHN configuration
thread and blocks until VH close is completed.
> * As result, VHN configuration thread will be performing VHN children close caused by
intruder protection, StateChange executor thread will be waiting for completion of VH close
task which is scheduled as a separate task, Broker configuration thread will be performing
REF.close waiting for shutdown of StateChange executor. When task to close VHN children is
complete is schedules task in broker configuration thread to close configuration store. The
latter can only be performed after intruder protection logic is completed.
> * Thus, we have an effective dead lock, when tasks block each other threads. 
> It seems that REF.close in intruder protection functionality is not only redundant but
harmful as it causes the effective dead lock. The deadlock resolves by timeout on waiting
for a task executor shutdown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org


Mime
View raw message