cloudstack-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ilya <ilya.mailing.li...@gmail.com>
Subject Re: KVM HA is broken, let's fix it
Date Fri, 16 Oct 2015 23:44:59 GMT
Please see another thread on DEV that proposes the fix for KVM HA ->
[DISCUSS] KVM HA with IPMI Fencing


----

We propose the following solution that in our understanding should cover
all use cases and provide a fencing mechanism.

NOTE: Proposed IPMI fencing, is just a script. If you are using HP
hardware with ILO, it could be an ILO executable with specific
parameters. In theory - this can be *any*  script not just IPMI.

Please take few minutes to read this through, to avoid duplicate efforts...


Proposed FS below:
----------------

https://cwiki.apache.org/confluence/display/CLOUDSTACK/KVM+HA+with+IPMI+Fencing


On 10/12/15 12:54 AM, Frank Louwers wrote:
> 
>> On 10 Oct 2015, at 12:35, Remi Bergsma <RBergsma@schubergphilis.com> wrote:
>>
>> Can you please explain what the issue is with KVM HA? In my tests, HA starts all
VMs just fine without the hypervisor coming back. At least that is on current 4.6. Assuming
a cluster of multiple nodes of course. It will then do a neighbor check from another host
in the same cluster. 
>>
>> Also, malfunctioning NFS leads to corruption and therefore we fence a box when the
shared storage is unreliable. Combining primary and secondary NFS is not a good idea for production
in my opinion. 
> 
> Well, it depends how you look at it, and what your situation is.
> 
> If you use 1 NFS export als primary storage (and only NFS), then yes, the system works
as one would expect, and doesn’t need to be fixed.
> 
> However, HA is “not functioning” in any of these scenario’s:
> 
> - you don’t use NFS as your only primary storage
> - you use more than one NFS primary storage
> 
> Even worse: imagine you only use local storage as primary storage, but have 1 NFS configured
(as the UI “wizard” forces you to configure one). You don’t have any active VM configured
on the primary storage. You then perform maintenance on the NFS storage, and take it offline…
> 
> All your hosts will then reboot, resulting in major downtime, that’s completely unnecessary.
There’s not even an option to disable this at this point… We’ve removed the reboot instructions
from the HA script on all our instances…
> 
> Regards,
> 
> Frank
> 

Mime
View raw message