cloudstack-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Richard Lawley <rich...@richardlawley.com>
Subject Re: upgraded XenServer host stays in Alert state
Date Mon, 25 Nov 2019 08:31:43 GMT
Andrija,

A simpler solution to this is to force ACS to set the host up again.  To do
this:

Remove the tag which indicates the host has been set up:
xe host-param-remove uuid=HOSTUUID param-name=tags param-key=
vmops-version-com.cloud.hypervisor.xenserver.resource.XenServer650R

Then simply restart the management server.  When it connects to that host,
it will perform the initial setup procedures again.

We use this procedure to roll out a customised SystemVM ISO, but it should
also take care of the other ACS modifications to a XenServer HV.

Regards,

Richard

On Mon, 25 Nov 2019 at 01:43, Andrija Panic <andrija.panic@gmail.com> wrote:

> An old topic, but I came to have to do the same (some testing) - so here is
> the procedure for you to test!
>
> Upgrade 6.5 host to 7.0 via CD/ISO (I ASSUME that same will happen from 7.0
> to 7.1, etc) - as Dag mentioned, this is now effectively a brand new server
> (you original files from the / partition are moved to the backup
> partitions). So problem is that ACS is aware that this host was already
> part of the cluster and will not attempt to copy over all the needed files
> that are copied over when adding a new host - so let's do that manually -
> from the backup partition to the correct live filesystem!
>
> Put host in maintenance mode in ACS - this will live migrate away all VMs,
> so the host is empty.
> Put the host in maintenance mode itself (XenServer or via XenCenter)
> Reboot, boot from new ISO/CD, upgrade XS, boot into the new OS, exit
> Maintenance mode in XenCenter/XenServer only.
>
> Now the fun begins
> ### This is effectively a new host (upgrade via ISO/CD), so let's add back
> ALL the missing parts (that are created when a new XS host is added to ACS)
> ### a slash before the "cp" command is NOT a typo
>
> # Mount your backup partition to /mnt/ (i.e. "mount /dev/sda2 /mnt/" - when
> listing the /mnt folder, you'll see the whole file system from before)
> # Make log  file
> mkdir /var/log/cloud
> touch /var/log/cloud/cloud.log
>
> #copy plugins from backup partition to correct folder
> \cp /mnt/etc/xensource/cloudstack_plugins.conf /etc/xensource/
> cd /mnt/etc/xapi.d/plugins/; \cp -t /etc/xapi.d/plugins/
> cloud-plugin-storage cloudstack_pluginlib.py cloudstack_pluginlib.pyc
> ovs-pvlan ovstunnel ovs-vif-flows.py s3xenserver swiftxenserver vmops
> vmopspremium vmopsSnapshot
>
> # copy needed scripts
> mkdir -p /opt/cloud/bin/
> \cp /mnt/opt/cloud/bin/* /opt/cloud/bin/
>
> #copy udev rules
> \cp /mnt/etc/udev/rules.d/xen-ovs-vif-flows.rules /etc/udev/rules.d
>
> #copy logrotate files
> \cp /mnt/etc/cron.hourly/logrotate /etc/cron.hourly/
> \cp /mnt/etc/logrotate.d/cloudlog /etc/logrotate.d/
>
> #copy ssh key
> \cp -f /mnt/root/.ssh/id_rsa.cloud /root/.ssh/
>
> #systemvm.iso
> \cp /mnt/opt/xensource/packages/iso/systemvm.iso
> /opt/xensource/packages/iso/
>
> Unless there are typos above, everything should be in its place!
>
> UPDATE `cloud`.`host` SET `mgmt_server_id`=NULL WHERE
>  `id`=<ID-OF-XENSERVER-HOST>;
> here ^^^ we ar setting host's mgmt server to NULL - so that ACS will later
> consider this one as "not owned" by any mgmt server and will try to connect
> it.
>
> Exit the maintenance mode for the host in ACS, and tailf the log - you
> should see lines as:
>
> Found 1 unmanaged direct hosts, processing connect for them
> ...
> ...
>  .... Executing cmd: rm -f /opt/xensource/sm/hostvmstats.py ........ ....
> ...... ......
>
> After that the host will be reconnected, without the error you were seeing
> before - "...callHostPlugin failed for cmd: setIptables with args...."
>
> Let me know if that works - I'll need to AGAIN test this end-to-end and on
> multiple hosts (probably advising that all slave servers are upgraded one
> by one by first removing the host from ACS completely, so that later ACS
> will properly do all this file copy from scratch (what we have been doing
> here manually) - but this manual work you'll be running on master (since
> this one is upgraded first).
>
> Best
> Andrija
>
> On Wed, 28 Mar 2018 at 07:18, Kristian Liivak <kris@wavecom.ee> wrote:
>
> > Hi Yiping
> >
> > Once i have updated xen cluster according that manual, but i replaced
> > files to correct paths. Manual is indead outdated.
> >
> > Option is also remove host from cs before upgrade and add them back
> after.
> > That will also copy nessesary  files to hosts
> >
> > Otherwise hosts and cs cannot communicate.. And snapshot of cs and host
> > backup partition help you revert old state
> >
> > Lugupidamisega / Regards
> >
> > Kristian
> >
> > ----- Original Message -----
> > From: "Yiping Zhang" <yzhang@marketo.com>
> > To: "users" <users@cloudstack.apache.org>
> > Sent: Tuesday, March 27, 2018 9:14:00 PM
> > Subject: Re: upgraded XenServer host stays in Alert state
> >
> > Hi, Kristian:
> >
> > Thanks for the link.  I have checked it out , but its contents are quite
> > dated, even though the link itself implies it is for ACS 4.11.
> > In the doc, there is no mentioning if it covers the case of upgrading
> from
> > XenServe 6.5SP1 to 7.0 with ACS version 4.9, so I am somewhat reluctant
> to
> > follow it as is.  Besides, I have upgraded two separate XenServer
> clusters
> > already without any issues by following my current own process so far,
> > including one cluster on current ACS instance.
> >
> > Yiping
> >
> > On 3/27/18, 12:37 AM, "Kristian Liivak" <kris@wavecom.ee> wrote:
> >
> >     Hi
> >
> >     Its really good question. I runned similar issue.
> >     But did you fallow xen upgrade instarations from end of
> >
> http://docs.cloudstack.apache.org/projects/cloudstack-installation/en/4.11/hypervisor/xenserver.html
> >
> >     And in my memory some paths where to copy files are changed and not
> > updated in documentation.
> >
> >
> >     Lugupidamisega / Regards
> >
> >     Kristian Liivak
> >
> >     WaveCom As
> >     Endla 16, 10142 Tallinn
> >     Estonia
> >     Tel: +3726850001
> >     Gsm: +37256850001
> >     E-mail: kris@wavecom.ee
> >     Skype: kristian.liivak
> >     http://www.wavecom.ee
> >     http://www.facebook.com/wavecom.ee
> >
> >     ----- Original Message -----
> >     From: "Yiping Zhang" <yzhang@marketo.com>
> >     To: "users" <users@cloudstack.apache.org>
> >     Sent: Monday, March 26, 2018 11:47:24 PM
> >     Subject: upgraded XenServer host stays in Alert state
> >
> >     Hi, all:
> >
> >
> >
> >     I am upgrading my ACS clusters from XenServer 6.5 to XenServer 7.0.
> I
> > am on ACS version 4.9.3.0. On this ACS instance, I have another fully
> > functioning XenServer 7.0 cluster already.
> >
> >
> >
> >     This time, after I upgraded the pool master, it remains in “Alert”
> > state, while all the slave hosts eventually are in “Up” state. Attempts
> to
> > reconnect the host (via UI or API) or restart management service have no
> > effects.
> >
> >
> >
> >     Looking at catalina.out log,  there is an error executing following
> > command on the host:  xe sm-list | grep "resigning of duplicates", what
> > exactly does this command do and how to fix it?
> >
> >
> >
> >     Note:  I did a manual upgrade of the pool master (from XenServer 7.0
> > ISO image), in order to keep the existing partition table and cluster
> > configurations, and following are the error logs from catalina.out file:
> >
> >
> >
> >     Yiping
> >
> >
> >
> >
> >
> >
> >
> >     INFO  [c.c.h.x.r.CitrixResourceBase] (AgentTaskPool-4:ctx-7e09325c)
> > XenServer Version is 7.0.0 for host 10.0.1.18
> >
> >     INFO  [c.c.h.x.r.CitrixResourceBase] (AgentTaskPool-4:ctx-7e09325c)
> > Private Network is mgmt for host 10.0.1.18
> >
> >     INFO  [c.c.h.x.r.CitrixResourceBase] (AgentTaskPool-4:ctx-7e09325c)
> > Guest Network is mgmt for host 10.0.1.18
> >
> >     INFO  [c.c.h.x.r.CitrixResourceBase] (AgentTaskPool-4:ctx-7e09325c)
> > Public Network is mgmt for host 10.0.1.18
> >
> >     ERROR [c.c.u.s.SshHelper] (AgentTaskPool-11:ctx-3ef0dede) SSH
> > execution of command xe sm-list | grep "resigning of duplicates" has an
> > error
> >
> >     status code in return. Result output:
> >
> >     INFO  [c.c.h.x.d.XcpServerDiscoverer] (AgentTaskPool-11:ctx-3ef0dede)
> > Host: xxxxxxxx connected with hypervisor type: XenServer. Checking
> CIDR...
> >
> >     INFO  [c.c.a.m.DirectAgentAttache] (AgentTaskPool-11:ctx-3ef0dede)
> > StartupAnswer received 71 Interval = 60
> >
> >     WARN  [c.c.h.x.d.XcpServerDiscoverer] (AgentTaskPool-11:ctx-3ef0dede)
> > defaulting to xenserver650 resource for product brand: XenServer with
> > product version: 7.0.0
> >
> >     INFO  [c.c.h.x.r.CitrixResourceBase] (DirectAgent-219:ctx-c04388fd)
> > Host 10.0.1.18 OpaqueRef:3a71d366-1db2-b082-93e0-73a70dd9d409: Host
> > 10.0.1.18 is already setup.
> >
> >     INFO  [c.c.h.x.r.CitrixResourceBase] (DirectAgent-219:ctx-c04388fd)
> > Host 10.0.1.18 OpaqueRef:3a71d366-1db2-b082-93e0-73a70dd9d409: Host
> > 10.0.1.18 is already setup.
> >
> >     WARN  [c.c.h.x.r.CitrixResourceBase] (DirectAgent-219:ctx-c04388fd)
> > callHostPlugin failed for cmd: setIptables with args  due to The
> requested
> > plugin could not be found.
> >
> >     WARN  [c.c.h.x.r.w.x.CitrixSetupCommandWrapper]
> > (DirectAgent-219:ctx-c04388fd) Unable to setup
> >
> >     com.cloud.utils.exception.CloudRuntimeException: callHostPlugin
> failed
> > for cmd: setIptables with args  due to The requested plugin could not be
> > found.
> >
> >         at
> >
> com.cloud.hypervisor.xenserver.resource.CitrixResourceBase.callHostPlugin(CitrixResourceBase.java:340)
> >
> >         at
> >
> com.cloud.hypervisor.xenserver.resource.CitrixResourceBase.setIptables(CitrixResourceBase.java:4555)
> >
> >         at
> >
> com.cloud.hypervisor.xenserver.resource.wrapper.xenbase.CitrixSetupCommandWrapper.execute(CitrixSetupCommandWrapper.java:63)
> >
> >         at
> >
> com.cloud.hypervisor.xenserver.resource.wrapper.xenbase.CitrixSetupCommandWrapper.execute(CitrixSetupCommandWrapper.java:45)
> >
> >         at
> >
> com.cloud.hypervisor.xenserver.resource.wrapper.xenbase.CitrixRequestWrapper.execute(CitrixRequestWrapper.java:122)
> >
> >         at
> >
> com.cloud.hypervisor.xenserver.resource.CitrixResourceBase.executeRequest(CitrixResourceBase.java:1693)
> >
> >         at
> >
> com.cloud.agent.manager.DirectAgentAttache$Task.runInContext(DirectAgentAttache.java:315)
> >
> >         at
> >
> org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
> >
> >         at
> >
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
> >
> >         at
> >
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
> >
> >         at
> >
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
> >
> >         at
> >
> org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
> >
> >         at
> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> >
> >         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> >
> >         at
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
> >
> >         at
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
> >
> >         at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> >
> >         at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >
> >         at java.lang.Thread.run(Thread.java:745)
> >
> >     WARN  [c.c.h.x.d.XcpServerDiscoverer] (AgentTaskPool-11:ctx-3ef0dede)
> > Unable to setup agent 71 due to callHostPlugin failed for cmd:
> setIptables
> > with args  due to The requested plugin could not be found.
> >
> >     INFO  [c.c.u.e.CSExceptionErrorCode] (AgentTaskPool-11:ctx-3ef0dede)
> > Could not find exception: com.cloud.exception.ConnectionException in
> error
> > code list for exceptions
> >
> >     WARN  [c.c.a.m.AgentManagerImpl] (AgentTaskPool-11:ctx-3ef0dede)
> > Monitor XcpServerDiscoverer says there is an error in the connect process
> > for 71 due to Reinitialize agent after setup.
> >
> >     INFO  [c.c.a.m.AgentManagerImpl] (AgentTaskPool-11:ctx-3ef0dede) Host
> > 71 is disconnecting with event AgentDisconnected
> >
> >     WARN  [c.c.r.ResourceManagerImpl] (AgentTaskPool-11:ctx-3ef0dede)
> > Unable to connect due to
> >
> >     com.cloud.exception.ConnectionException: Reinitialize agent after
> > setup.
> >
> >         at
> >
> com.cloud.hypervisor.xenserver.discoverer.XcpServerDiscoverer.processConnect(XcpServerDiscoverer.java:627)
> >
> >         at
> >
> com.cloud.agent.manager.AgentManagerImpl.notifyMonitorsOfConnection(AgentManagerImpl.java:567)
> >
> >         at
> >
> com.cloud.agent.manager.AgentManagerImpl.handleDirectConnectAgent(AgentManagerImpl.java:1521)
> >
> >         at
> >
> com.cloud.resource.ResourceManagerImpl.createHostAndAgent(ResourceManagerImpl.java:1909)
> >
> >         at
> >
> com.cloud.resource.ResourceManagerImpl.createHostAndAgent(ResourceManagerImpl.java:2042)
> >
> >         at sun.reflect.GeneratedMethodAccessor66.invoke(Unknown Source)
> >
> >         at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> >
> >         at java.lang.reflect.Method.invoke(Method.java:606)
> >
> >         at
> >
> org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317)
> >
> >         at
> >
> org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183)
> >
> >         at
> >
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
> >
> >         at
> >
> org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:91)
> >
> >         at
> >
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
> >
> >         at
> >
> org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
> >
> >         at com.sun.proxy.$Proxy160.createHostAndAgent(Unknown Source)
> >
> >         at
> >
> com.cloud.agent.manager.AgentManagerImpl$SimulateStartTask.runInContext(AgentManagerImpl.java:1138)
> >
> >         at
> >
> org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
> >
> >         at
> >
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
> >
> >         at
> >
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
> >
> >         at
> >
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
> >
> >         at
> >
> org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
> >
> >         at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> >
> >         at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >
> >         at java.lang.Thread.run(Thread.java:745)
> >
>
>
> --
>
> Andrija Panić
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message