cloudstack-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Riepl, Gregor (SWISS TXT)" <Gregor.Ri...@swisstxt.ch>
Subject Re: Virtual routers randomly rebooting
Date Tue, 06 Aug 2019 15:43:39 GMT
Hi Nick,

This might not be relevant for Xen, but we've had problems with memory leaks on the VRs on
VMware when balloon memory was enabled.

A while ago, we built a custom router monitoring setup via SSH for our environment, because
CloudStack doesn't give us enough information about router status. This caused the VR kernel
to leak memory, and the router to reboot suddenly when memory was used up.

The issue was fixed by several memory management optimisations on the system VM template (done
by René, Rohit and Angus, if I remember correctly) and by setting an OS type that would cause
VMware to completely disable balloon memory.

It's possible that you have a similar issue - can you monitor the affected VRs for a while
and see if the reboots are caused by a memory leak?
We still see memory being used up slowly, but when a critical threshold (~98%) is reached,
the kernel will garbage collect it.

Regards,
Gregor
________________________________
From: Nick Thompson <Nick.Thompson@neos.co.nz>
Sent: 06 August 2019 06:01
To: 'users@cloudstack.apache.org' <users@cloudstack.apache.org>
Subject: RE: Virtual routers randomly rebooting

Hey,

Thanks Andrija,

>From what I have read I didn't need to add the hypervisor mapping 7.5 as CloudStack only
looks at the version number rather than the name of the hypervisor at this stage (e.g. XenServer
vs XCP-ng). Also the issue was happening in XenServer 6.5 anyway and CloudStack isn't having
any problems controlling XCP-ng.

>From what I have seen so far only some VPCs are randomly rebooting (on different hosts
too), Storage and Console and standard network VMs seem to be fine (however I have a lot more
VPCs than any other network type). I'm not sure if the VM is rebooting itself or if the Management
Server is having an issue communicating with the VM so shutting it down and restarting it.

Is there a way to disable checks on VPCs so it doesn't try and restart the router VM? I have
found/tried network.router.EnableServiceMonitoring = NO but from the documentation it notes
that VPC networks are not supported.

Any suggestions in what I could try/look into would greatly be appreciated.

Regards,
Nick Thompson


-----Original Message-----
From: Andrija Panic [mailto:andrija.panic@gmail.com]
Sent: Wednesday, 17 July 2019 6:24 a.m.
To: users <users@cloudstack.apache.org>; Rohit Yadav <rohit.yadav@shapeblue.com>
Subject: Re: Virtual routers randomly rebooting

Have you added os/hypervisor mappings inside the DB? I vaguely remember 7.5 not having needed
mapping and was considered to be 6.5, thus a manual fix was needed.

Perhaps Rohit can sched some light?

Anyways, a full log would be great (pastebin or other online service please).

Regards

On Tue, 16 Jul 2019, 00:59 Nick Thompson, <Nick.Thompson@neos.co.nz> wrote:

> Hey,
>
> Since we upgraded to the 4.11 branch (currently 4.11.3) and virtual
> routers have become HVM on XenServer/XCP-ng we have had problems with
> the virtual routers randomly rebooting themselves. We still have some
> running in the older paravirtualized mode and they seem to be fine (it
> may be that the Management server can't communicate to these virtual
> routers since they are an older template?). Other running Windows/Linux VMs are fine.
>
> CloudStack Cluster: XCP-ng 7.5 (previously XenServer 6.5, same issue
> was
> happening)
> CloudStack: 4.11.3 (same issue in 4.11.2, was working fine in V4.9.3
>
> When digging through the management-server.log, I have found the
> following;
>
> >grep -n "Error while collecting network stats from router"
> /var/log/cloudstack/management/management-server.log
> 919060:2019-07-16 10:28:43,756 WARN
> [c.c.n.r.VirtualNetworkApplianceManagerImpl]
> (RouterMonitor-1:ctx-1931d792)
> (logid:6e236728) Error while collecting network stats from router:
> r-2280-VM from host: 71; details: Exception: java.lang.Exception
> 919328:2019-07-16 10:28:50,940 WARN
> [c.c.n.r.VirtualNetworkApplianceManagerImpl]
> (RouterMonitor-1:ctx-1931d792)
> (logid:6e236728) Error while collecting network stats from router:
> r-2280-VM from host: 71; details: Exception: java.lang.Exception
> 920533:2019-07-16 10:29:54,768 WARN
> [c.c.n.r.VirtualNetworkApplianceManagerImpl]
> (Work-Job-Executor-63:ctx-6d1f0235 job-24930/job-24944 ctx-a15a2697)
> (logid:2b94fd5d) Error while collecting network stats from router:
> r-2280-VM from host: 71; details: Exception: java.lang.Exception
> 920621:2019-07-16 10:30:01,952 WARN
> [c.c.n.r.VirtualNetworkApplianceManagerImpl]
> (Work-Job-Executor-63:ctx-6d1f0235 job-24930/job-24944 ctx-a15a2697)
> (logid:2b94fd5d) Error while collecting network stats from router:
> r-2280-VM from host: 71; details: Exception: java.lang.Exception
>
> >less +920621 /var/log/cloudstack/management/management-server.log
> 2019-07-16 10:30:01,952 DEBUG [c.c.a.t.Request]
> (Work-Job-Executor-63:ctx-6d1f0235 job-24930/job-24944 ctx-a15a2697)
> (logid:2b94fd5d) Seq 71-7505811728966836555: Received:  { Ans: , MgmtId:
> 226842157555374, via: 71(hostname), Ver: v1, Flags: 10, {
> NetworkUsageAnswer } }
> 2019-07-16 10:30:01,952 DEBUG [c.c.a.m.AgentManagerImpl]
> (Work-Job-Executor-63:ctx-6d1f0235 job-24930/job-24944 ctx-a15a2697)
> (logid:2b94fd5d) Details from executing class
> com.cloud.agent.api.NetworkUsageCommand: Exception:
> java.lang.Exception
> Message:  vpc network usage plugin call failed
> Stack: java.lang.Exception:  vpc network usage plugin call failed
>         at
> com.cloud.hypervisor.xenserver.resource.wrapper.xen56.XenServer56NetworkUsageCommandWrapper.executeNetworkUsage(XenServer56NetworkUsageCommandWrapper.java:84)
>         at
> com.cloud.hypervisor.xenserver.resource.wrapper.xen56.XenServer56NetworkUsageCommandWrapper.execute(XenServer56NetworkUsageCommandWrapper.java:41)
>         at
> com.cloud.hypervisor.xenserver.resource.wrapper.xen56.XenServer56NetworkUsageCommandWrapper.execute(XenServer56NetworkUsageCommandWrapper.java:33)
>         at
> com.cloud.hypervisor.xenserver.resource.wrapper.xenbase.CitrixRequestWrapper.execute(CitrixRequestWrapper.java:122)
>         at
> com.cloud.hypervisor.xenserver.resource.CitrixResourceBase.executeRequest(CitrixResourceBase.java:1737)
>         at
> com.cloud.agent.manager.DirectAgentAttache$Task.runInContext(DirectAgentAttache.java:315)
>         at
> org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
>         at
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
>         at
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
>         at
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
>         at
> org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
>
> 2019-07-16 10:30:01,952 WARN
> [c.c.n.r.VirtualNetworkApplianceManagerImpl]
> (Work-Job-Executor-63:ctx-6d1f0235 job-24930/job-24944 ctx-a15a2697)
> (logid:2b94fd5d) Error while collecting network stats from router:
> r-2280-VM from host: 71; details: Exception: java.lang.Exception
> Message:  vpc network usage plugin call failed
> Stack: java.lang.Exception:  vpc network usage plugin call failed
>         at
> com.cloud.hypervisor.xenserver.resource.wrapper.xen56.XenServer56NetworkUsageCommandWrapper.executeNetworkUsage(XenServer56NetworkUsageCommandWrapper.java:84)
>         at
> com.cloud.hypervisor.xenserver.resource.wrapper.xen56.XenServer56NetworkUsageCommandWrapper.execute(XenServer56NetworkUsageCommandWrapper.java:41)
>         at
> com.cloud.hypervisor.xenserver.resource.wrapper.xen56.XenServer56NetworkUsageCommandWrapper.execute(XenServer56NetworkUsageCommandWrapper.java:33)
>         at
> com.cloud.hypervisor.xenserver.resource.wrapper.xenbase.CitrixRequestWrapper.execute(CitrixRequestWrapper.java:122)
>         at
> com.cloud.hypervisor.xenserver.resource.CitrixResourceBase.executeRequest(CitrixResourceBase.java:1737)
>         at
> com.cloud.agent.manager.DirectAgentAttache$Task.runInContext(DirectAgentAttache.java:315)
>         at
> org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
>         at
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
>         at
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
>         at
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
>         at
> org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
>
> 2019-07-16 10:30:01,973 DEBUG [c.c.a.t.Request]
> (Work-Job-Executor-63:ctx-6d1f0235 job-24930/job-24944 ctx-a15a2697)
> (logid:2b94fd5d) Seq 71-7505811728966836557: Sending  { Cmd , MgmtId:
> 226842157555374, via: 71(hostname), Ver: v1, Flags: 100011,
> [{"com.cloud.agent.api.StopCommand":{"isProxy":false,"checkBeforeClean
> up":false,"controlIp":"169.254.0.200","forceStop":true,"volumesToDisco
> nnect":[],"vmName":"r-2280-VM","executeInSequence":false,"wait":0}}]
> }
> 2019-07-16 10:30:01,973 DEBUG [c.c.a.t.Request]
> (Work-Job-Executor-63:ctx-6d1f0235 job-24930/job-24944 ctx-a15a2697)
> (logid:2b94fd5d) Seq 71-7505811728966836557: Executing:  { Cmd , MgmtId:
> 226842157555374, via: 71(hostname), Ver: v1, Flags: 100011,
> [{"com.cloud.agent.api.StopCommand":{"isProxy":false,"checkBeforeClean
> up":false,"controlIp":"169.254.0.200","forceStop":true,"volumesToDisco
> nnect":[],"vmName":"r-2280-VM","executeInSequence":false,"wait":0}}]
> }
> 2019-07-16 10:30:01,973 DEBUG [c.c.a.m.DirectAgentAttache]
> (DirectAgent-430:ctx-77c57645) (logid:8733e444) Seq 71-7505811728966836557:
> Executing request
> 2019-07-16 10:30:01,995 DEBUG [c.c.h.x.r.w.x.CitrixStopCommandWrapper]
> (DirectAgent-430:ctx-77c57645) (logid:2b94fd5d) 9. The VM r-2280-VM is
> in Stopping state
> 2019-07-16 10:30:02,303 DEBUG [c.c.a.m.DirectAgentAttache]
> (DirectAgent-390:ctx-3b9a78cd) (logid:27f6ec94) Seq 67-8459448950062117779:
> Response Received
>
>
> Any thoughts would be greatly appreciated.
>
> Cheers,
> Nick.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message