cloudstack-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrija Panic <andrija.pa...@gmail.com>
Subject Re: upgraded XenServer host stays in Alert state
Date Mon, 25 Nov 2019 01:43:04 GMT
An old topic, but I came to have to do the same (some testing) - so here is
the procedure for you to test!

Upgrade 6.5 host to 7.0 via CD/ISO (I ASSUME that same will happen from 7.0
to 7.1, etc) - as Dag mentioned, this is now effectively a brand new server
(you original files from the / partition are moved to the backup
partitions). So problem is that ACS is aware that this host was already
part of the cluster and will not attempt to copy over all the needed files
that are copied over when adding a new host - so let's do that manually -
from the backup partition to the correct live filesystem!

Put host in maintenance mode in ACS - this will live migrate away all VMs,
so the host is empty.
Put the host in maintenance mode itself (XenServer or via XenCenter)
Reboot, boot from new ISO/CD, upgrade XS, boot into the new OS, exit
Maintenance mode in XenCenter/XenServer only.

Now the fun begins
### This is effectively a new host (upgrade via ISO/CD), so let's add back
ALL the missing parts (that are created when a new XS host is added to ACS)
### a slash before the "cp" command is NOT a typo

# Mount your backup partition to /mnt/ (i.e. "mount /dev/sda2 /mnt/" - when
listing the /mnt folder, you'll see the whole file system from before)
# Make log  file
mkdir /var/log/cloud
touch /var/log/cloud/cloud.log

#copy plugins from backup partition to correct folder
\cp /mnt/etc/xensource/cloudstack_plugins.conf /etc/xensource/
cd /mnt/etc/xapi.d/plugins/; \cp -t /etc/xapi.d/plugins/
cloud-plugin-storage cloudstack_pluginlib.py cloudstack_pluginlib.pyc
ovs-pvlan ovstunnel ovs-vif-flows.py s3xenserver swiftxenserver vmops
vmopspremium vmopsSnapshot

# copy needed scripts
mkdir -p /opt/cloud/bin/
\cp /mnt/opt/cloud/bin/* /opt/cloud/bin/

#copy udev rules
\cp /mnt/etc/udev/rules.d/xen-ovs-vif-flows.rules /etc/udev/rules.d

#copy logrotate files
\cp /mnt/etc/cron.hourly/logrotate /etc/cron.hourly/
\cp /mnt/etc/logrotate.d/cloudlog /etc/logrotate.d/

#copy ssh key
\cp -f /mnt/root/.ssh/id_rsa.cloud /root/.ssh/

#systemvm.iso
\cp /mnt/opt/xensource/packages/iso/systemvm.iso
/opt/xensource/packages/iso/

Unless there are typos above, everything should be in its place!

UPDATE `cloud`.`host` SET `mgmt_server_id`=NULL WHERE
 `id`=<ID-OF-XENSERVER-HOST>;
here ^^^ we ar setting host's mgmt server to NULL - so that ACS will later
consider this one as "not owned" by any mgmt server and will try to connect
it.

Exit the maintenance mode for the host in ACS, and tailf the log - you
should see lines as:

Found 1 unmanaged direct hosts, processing connect for them
...
...
 .... Executing cmd: rm -f /opt/xensource/sm/hostvmstats.py ........ ....
...... ......

After that the host will be reconnected, without the error you were seeing
before - "...callHostPlugin failed for cmd: setIptables with args...."

Let me know if that works - I'll need to AGAIN test this end-to-end and on
multiple hosts (probably advising that all slave servers are upgraded one
by one by first removing the host from ACS completely, so that later ACS
will properly do all this file copy from scratch (what we have been doing
here manually) - but this manual work you'll be running on master (since
this one is upgraded first).

Best
Andrija

On Wed, 28 Mar 2018 at 07:18, Kristian Liivak <kris@wavecom.ee> wrote:

> Hi Yiping
>
> Once i have updated xen cluster according that manual, but i replaced
> files to correct paths. Manual is indead outdated.
>
> Option is also remove host from cs before upgrade and add them back after.
> That will also copy nessesary  files to hosts
>
> Otherwise hosts and cs cannot communicate.. And snapshot of cs and host
> backup partition help you revert old state
>
> Lugupidamisega / Regards
>
> Kristian
>
> ----- Original Message -----
> From: "Yiping Zhang" <yzhang@marketo.com>
> To: "users" <users@cloudstack.apache.org>
> Sent: Tuesday, March 27, 2018 9:14:00 PM
> Subject: Re: upgraded XenServer host stays in Alert state
>
> Hi, Kristian:
>
> Thanks for the link.  I have checked it out , but its contents are quite
> dated, even though the link itself implies it is for ACS 4.11.
> In the doc, there is no mentioning if it covers the case of upgrading from
> XenServe 6.5SP1 to 7.0 with ACS version 4.9, so I am somewhat reluctant to
> follow it as is.  Besides, I have upgraded two separate XenServer clusters
> already without any issues by following my current own process so far,
> including one cluster on current ACS instance.
>
> Yiping
>
> On 3/27/18, 12:37 AM, "Kristian Liivak" <kris@wavecom.ee> wrote:
>
>     Hi
>
>     Its really good question. I runned similar issue.
>     But did you fallow xen upgrade instarations from end of
> http://docs.cloudstack.apache.org/projects/cloudstack-installation/en/4.11/hypervisor/xenserver.html
>
>     And in my memory some paths where to copy files are changed and not
> updated in documentation.
>
>
>     Lugupidamisega / Regards
>
>     Kristian Liivak
>
>     WaveCom As
>     Endla 16, 10142 Tallinn
>     Estonia
>     Tel: +3726850001
>     Gsm: +37256850001
>     E-mail: kris@wavecom.ee
>     Skype: kristian.liivak
>     http://www.wavecom.ee
>     http://www.facebook.com/wavecom.ee
>
>     ----- Original Message -----
>     From: "Yiping Zhang" <yzhang@marketo.com>
>     To: "users" <users@cloudstack.apache.org>
>     Sent: Monday, March 26, 2018 11:47:24 PM
>     Subject: upgraded XenServer host stays in Alert state
>
>     Hi, all:
>
>
>
>     I am upgrading my ACS clusters from XenServer 6.5 to XenServer 7.0.  I
> am on ACS version 4.9.3.0. On this ACS instance, I have another fully
> functioning XenServer 7.0 cluster already.
>
>
>
>     This time, after I upgraded the pool master, it remains in “Alert”
> state, while all the slave hosts eventually are in “Up” state. Attempts to
> reconnect the host (via UI or API) or restart management service have no
> effects.
>
>
>
>     Looking at catalina.out log,  there is an error executing following
> command on the host:  xe sm-list | grep "resigning of duplicates", what
> exactly does this command do and how to fix it?
>
>
>
>     Note:  I did a manual upgrade of the pool master (from XenServer 7.0
> ISO image), in order to keep the existing partition table and cluster
> configurations, and following are the error logs from catalina.out file:
>
>
>
>     Yiping
>
>
>
>
>
>
>
>     INFO  [c.c.h.x.r.CitrixResourceBase] (AgentTaskPool-4:ctx-7e09325c)
> XenServer Version is 7.0.0 for host 10.0.1.18
>
>     INFO  [c.c.h.x.r.CitrixResourceBase] (AgentTaskPool-4:ctx-7e09325c)
> Private Network is mgmt for host 10.0.1.18
>
>     INFO  [c.c.h.x.r.CitrixResourceBase] (AgentTaskPool-4:ctx-7e09325c)
> Guest Network is mgmt for host 10.0.1.18
>
>     INFO  [c.c.h.x.r.CitrixResourceBase] (AgentTaskPool-4:ctx-7e09325c)
> Public Network is mgmt for host 10.0.1.18
>
>     ERROR [c.c.u.s.SshHelper] (AgentTaskPool-11:ctx-3ef0dede) SSH
> execution of command xe sm-list | grep "resigning of duplicates" has an
> error
>
>     status code in return. Result output:
>
>     INFO  [c.c.h.x.d.XcpServerDiscoverer] (AgentTaskPool-11:ctx-3ef0dede)
> Host: xxxxxxxx connected with hypervisor type: XenServer. Checking CIDR...
>
>     INFO  [c.c.a.m.DirectAgentAttache] (AgentTaskPool-11:ctx-3ef0dede)
> StartupAnswer received 71 Interval = 60
>
>     WARN  [c.c.h.x.d.XcpServerDiscoverer] (AgentTaskPool-11:ctx-3ef0dede)
> defaulting to xenserver650 resource for product brand: XenServer with
> product version: 7.0.0
>
>     INFO  [c.c.h.x.r.CitrixResourceBase] (DirectAgent-219:ctx-c04388fd)
> Host 10.0.1.18 OpaqueRef:3a71d366-1db2-b082-93e0-73a70dd9d409: Host
> 10.0.1.18 is already setup.
>
>     INFO  [c.c.h.x.r.CitrixResourceBase] (DirectAgent-219:ctx-c04388fd)
> Host 10.0.1.18 OpaqueRef:3a71d366-1db2-b082-93e0-73a70dd9d409: Host
> 10.0.1.18 is already setup.
>
>     WARN  [c.c.h.x.r.CitrixResourceBase] (DirectAgent-219:ctx-c04388fd)
> callHostPlugin failed for cmd: setIptables with args  due to The requested
> plugin could not be found.
>
>     WARN  [c.c.h.x.r.w.x.CitrixSetupCommandWrapper]
> (DirectAgent-219:ctx-c04388fd) Unable to setup
>
>     com.cloud.utils.exception.CloudRuntimeException: callHostPlugin failed
> for cmd: setIptables with args  due to The requested plugin could not be
> found.
>
>         at
> com.cloud.hypervisor.xenserver.resource.CitrixResourceBase.callHostPlugin(CitrixResourceBase.java:340)
>
>         at
> com.cloud.hypervisor.xenserver.resource.CitrixResourceBase.setIptables(CitrixResourceBase.java:4555)
>
>         at
> com.cloud.hypervisor.xenserver.resource.wrapper.xenbase.CitrixSetupCommandWrapper.execute(CitrixSetupCommandWrapper.java:63)
>
>         at
> com.cloud.hypervisor.xenserver.resource.wrapper.xenbase.CitrixSetupCommandWrapper.execute(CitrixSetupCommandWrapper.java:45)
>
>         at
> com.cloud.hypervisor.xenserver.resource.wrapper.xenbase.CitrixRequestWrapper.execute(CitrixRequestWrapper.java:122)
>
>         at
> com.cloud.hypervisor.xenserver.resource.CitrixResourceBase.executeRequest(CitrixResourceBase.java:1693)
>
>         at
> com.cloud.agent.manager.DirectAgentAttache$Task.runInContext(DirectAgentAttache.java:315)
>
>         at
> org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
>
>         at
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
>
>         at
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
>
>         at
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
>
>         at
> org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
>
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
>
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
>
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>
>         at java.lang.Thread.run(Thread.java:745)
>
>     WARN  [c.c.h.x.d.XcpServerDiscoverer] (AgentTaskPool-11:ctx-3ef0dede)
> Unable to setup agent 71 due to callHostPlugin failed for cmd: setIptables
> with args  due to The requested plugin could not be found.
>
>     INFO  [c.c.u.e.CSExceptionErrorCode] (AgentTaskPool-11:ctx-3ef0dede)
> Could not find exception: com.cloud.exception.ConnectionException in error
> code list for exceptions
>
>     WARN  [c.c.a.m.AgentManagerImpl] (AgentTaskPool-11:ctx-3ef0dede)
> Monitor XcpServerDiscoverer says there is an error in the connect process
> for 71 due to Reinitialize agent after setup.
>
>     INFO  [c.c.a.m.AgentManagerImpl] (AgentTaskPool-11:ctx-3ef0dede) Host
> 71 is disconnecting with event AgentDisconnected
>
>     WARN  [c.c.r.ResourceManagerImpl] (AgentTaskPool-11:ctx-3ef0dede)
> Unable to connect due to
>
>     com.cloud.exception.ConnectionException: Reinitialize agent after
> setup.
>
>         at
> com.cloud.hypervisor.xenserver.discoverer.XcpServerDiscoverer.processConnect(XcpServerDiscoverer.java:627)
>
>         at
> com.cloud.agent.manager.AgentManagerImpl.notifyMonitorsOfConnection(AgentManagerImpl.java:567)
>
>         at
> com.cloud.agent.manager.AgentManagerImpl.handleDirectConnectAgent(AgentManagerImpl.java:1521)
>
>         at
> com.cloud.resource.ResourceManagerImpl.createHostAndAgent(ResourceManagerImpl.java:1909)
>
>         at
> com.cloud.resource.ResourceManagerImpl.createHostAndAgent(ResourceManagerImpl.java:2042)
>
>         at sun.reflect.GeneratedMethodAccessor66.invoke(Unknown Source)
>
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>
>         at java.lang.reflect.Method.invoke(Method.java:606)
>
>         at
> org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317)
>
>         at
> org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183)
>
>         at
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
>
>         at
> org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:91)
>
>         at
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
>
>         at
> org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
>
>         at com.sun.proxy.$Proxy160.createHostAndAgent(Unknown Source)
>
>         at
> com.cloud.agent.manager.AgentManagerImpl$SimulateStartTask.runInContext(AgentManagerImpl.java:1138)
>
>         at
> org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
>
>         at
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
>
>         at
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
>
>         at
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
>
>         at
> org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
>
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>
>         at java.lang.Thread.run(Thread.java:745)
>


-- 

Andrija Panić

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message