cloudstack-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wido den Hollander <w...@widodh.nl>
Subject Re: RBD primary storage VM encounters Exclusive Lock after triggering HA
Date Tue, 28 May 2019 11:42:18 GMT


On 5/28/19 6:16 AM, li jerry wrote:
> Hello guys
> 
> we’ve deployed an environment with CloudStack 4.11.2 and KVM(CentOS7.6), and Ceph 13.2.5
is deployed as the primary storage.
> We found some issues with the HA solution, and we are here to ask for you suggestions.
> 
> We’ve both enabled VM HA and Host HA feature in CloudStack, and the compute offering
is tagged as ha.
> When we try to perform a power failure test (unplug 1 node of 4), the running VMs on
the removed node is automatically rescheduled to the other living nodes after 5 minutes, but
all of them can not boot into the OS. We found the booting procedure is stuck by the IO read/write
failure.
> 
> 
> 
> The following information is prompted after VM starts:
> 
> Generating "/run/initramfs/rdsosreport.txt"
> 
> Entering emergency mode. Exit the shell to continue.
> Type "journalctl" to view system logs.
> You might want to save "/run/initramfs/rdsosreport.txt" to a USB stick or /boot
> after mounting them and attach it to a bug report
> 
> :/#
> 
> 
> 
> We found this is caused by the lock on the image:
> [root@cn01-nodea ~]# rbd lock list a93010b0-2be2-49bd-b25e-ec89b3a98b4b
> There is 1 exclusive lock on this image.
> Locker         ID                  Address
> client.1164351 auto 94464726847232 10.226.16.128:0/3002249644
> 
> If we remove the lock from the image, and restart the VM under CloudStack, this VM will
boot successfully.
> 
> We know that if we disable the Exclusive Lock feature (by setting rbd_default_features
= 3) for Ceph would solve this problem. But we don’t think it’s the best solution for
the HA, so could you please give us some ideas about how you are doing and what is the best
practice for this feature?
> 

exclusive-lock is something to prevent a split-brain and having two
clients write to it at the same time.

The lock should be released to the other client if this is requested,
but I have the feeling that you might have a cephx problem there.

Can you post the output of:

$ ceph auth get client.X

Where you replace X by the user you are using for CloudStack? Also
remove they 'key', I don't need that.

I want to look at the caps of the user.

Wido

> Thanks.
> 
> 

Mime
View raw message