qpid-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alan Conway (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (QPID-4286) QMF queries for HA replication take too long to process
Date Mon, 15 Oct 2012 21:39:03 GMT

     [ https://issues.apache.org/jira/browse/QPID-4286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Alan Conway resolved QPID-4286.
-------------------------------

    Resolution: Fixed

Committed Jason's patch on trunk:
------------------------------------------------------------------------   
r1398530 | aconway | 2012-10-15 17:35:38 -0400 (Mon, 15 Oct 2012) | 6 lines

MQPID-4286: QMF queries for HA replication take too long to process (Jason Dillaman)

Rework ManagementAgent locks, get rid of shared buffers that were points of contention.

Minor log message improvements in ha code.

------------------------------------------------------------------------

                
> QMF queries for HA replication take too long to process
> -------------------------------------------------------
>
>                 Key: QPID-4286
>                 URL: https://issues.apache.org/jira/browse/QPID-4286
>             Project: Qpid
>          Issue Type: Bug
>          Components: C++ Broker
>    Affects Versions: 0.18
>            Reporter: Jason Dillaman
>            Assignee: Alan Conway
>         Attachments: qpid-4286-fixes.patch, qpid-4286.patch
>
>
> In an HA broker with approximately 12,000 queues, it takes roughly 10-14 seconds for
the the first QMF response fragment to arrive.  While the QMF management agent is collecting
the response, all other QMF-related functionality is blocked  -- which will block any thread
that raises a QMF event.  
> Not only will this result in clients getting disconnected from the broker due to worker
threads being blocked by QMF (either due to missed heartbeats in an extreme case or from the
2 second handshake timeout), this also results in the HA backup's federated link getting disconnected
due to missed heartbeats when the link heartbeat interval is set to a low value.  
> If the HA backup loses its connection, it only exacerbates the issue since it will reconnect
and re-query the QMF data that made it lose its connection in the first place.  
> Recommend that QMF events not be blocked by a global management agent lock and also recommend
that potentially long-running QMF queries be separated from the worker thread that initiated
them to prevent a heartbeat timeout.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org


Mime
View raw message