qpid-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pavel Moravec (JIRA)" <j...@apache.org>
Subject [jira] [Closed] (QPID-3796) QMF errors ignored by cluster, causing cluster de-sync
Date Wed, 08 Aug 2012 07:19:09 GMT

     [ https://issues.apache.org/jira/browse/QPID-3796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Pavel Moravec closed QPID-3796.

    Resolution: Not A Problem

Quoting Ken Giusti:

The result, while not ideal, cannot be prevented because the cluster cannot be guaranteed
to operate correctly in this configuration.

The host environment differs between the clustered brokers - one host has more available diskspace
than the other.   This contradicts the prescribed deployment guidelines for clustering - the
environments must provide equvalent resources. If that is not held, eventually discrepencies
will be introduced.
> QMF errors ignored by cluster, causing cluster de-sync
> ------------------------------------------------------
>                 Key: QPID-3796
>                 URL: https://issues.apache.org/jira/browse/QPID-3796
>             Project: Qpid
>          Issue Type: Bug
>          Components: C++ Broker
>    Affects Versions: 0.12
>            Reporter: Pavel Moravec
>         Attachments: create_queue.cpp
> Cluster error handling ignores errors on QMF. That leads to leave running a node affected
by an error not seen by other nodes, i.e cluster de-sync.
> Particular example: Via QMF, create a huge durable queue on a 2 node cluster, such that
node1 of the cluster does not have sufficient free disk space for the queue journals, while
node2 has enough free disk space. Cluster won't detect node1 failed to create the queue, leaving
a cluster running with 1 node with the queue and 1 node without the queue.
> Reproduction scenario:
> 1) 2 node cluster running
> 2) Let leave less than 13M of free disk space on node1 (while enough free space on node2)
> 3) On node1, run the attached simple program that will create queue HugeDurableQueue
with qpid.file_count=64 and qpid.file_size=16384.
> 4) QMF response will be negative (correct), but both nodes will be running with node1
not having the queue provisioned while node2 having the queue.
> 5) Repeating the test with sending the QMF command to node2 (with enough free disk space)
will produce _positive_ QMF response - a user is _not_ aware of a problem on the cluster anyhow.
> Both problems (node1 needs to be shutted down + QMF response has to be NACK everytime)
shall be fixed.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org

View raw message