cloudstack-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yordan Kostov <Yord...@NSOGROUP.COM>
Subject RE: slow vm start and dhcp log full?
Date Tue, 10 Aug 2021 09:38:07 GMT
Hey everyone,

	 I figured it out. It was a faulty SFP that caused a bottleneck of IOPS so VRs could not
write in the log dir which cascaded into DHCP outage.

Best regards,
Jordan  

-----Original Message-----
From: Yordan Kostov <YordanK@NSOGROUP.COM> 
Sent: 09 август 2021 г. 14:50
To: users@cloudstack.apache.org
Subject: slow vm start and dhcp log full?


[X] This message came from outside your organization


Hello everyone,

                Cloudstack 4.15 + XCP-NG 82 + Virtual router template 4.15. We got just about
15 VMs or so running. Mostly doing some backup tests or people trying it out.

                Recently I noticed quite some sluggishness on our environment. It took about
5-10 mins to create a new VM or start existing one.
                One of our networks stopped creating VMs where it seems the Virtual router
was not giving addresses.

After some troubleshooting  I found the following issues:

  *   The Virtual router that did not give IP addresses had his /run/log/journal directory
fill in the whole /run partition with logs.  It seems when this happen the Router stops giving
IP addresses.
  *   The same Virtual router + one more were putting heavy load on the storage (20-25 MB/s)
squeezing all the IOPS they can get.


Lets say issue number one is by design. What causes issue number 2?
VR logs  ( journalctl -p 3 -x --file /run/log/journal/5212989feea04bb6b13843e7b0c9d2b3/system.journal
)  show this issue repeating:

Aug 09 11:41:22 r-39-VM systemd[1]: Failed to start User Manager for UID 0.
-- Subject: A start job for unit user@0.service has failed
-- Defined-By: systemd
-- Support: https://urldefense.com/v3/__https://www.debian.org/support__;!!A6UyJA!wCf6hAHLa6AftXnrRfqcu9NkyxpVWGHy_xO0Bxz2lPUzny2fOmjNxxkOFmN4WsBnk9u5yxTvRxGj$
--
-- A start job for unit user@0.service has finished with a failure.
--
-- The job identifier is 588 and the job result is failed.
Aug 09 11:41:29 r-39-VM systemd[1607]: PAM _pam_load_conf_file: unable to open config for
/etc/pam.d/null Aug 09 11:41:29 r-39-VM systemd[1607]: PAM error loading (null) Aug 09 11:41:29
r-39-VM systemd[1607]: PAM _pam_init_handlers: error reading /etc/pam.d/systemd-user Aug 09
11:41:29 r-39-VM systemd[1607]: PAM _pam_init_handlers: [Critical error - immediate abort]
Aug 09 11:41:29 r-39-VM systemd[1607]: PAM error reading PAM configuration file Aug 09 11:41:29
r-39-VM systemd[1607]: PAM pam_start: failed to initialize handlers Aug 09 11:41:29 r-39-VM
systemd[1607]: PAM failed: Critical error - immediate abort Aug 09 11:41:29 r-39-VM systemd[1607]:
user@0.service: Failed to set up PAM session: Operation not permitted Aug 09 11:41:29 r-39-VM
systemd[1607]: user@0.service: Failed at step PAM spawning /lib/systemd/systemd: Operation
not permitted
-- Subject: Process /lib/systemd/systemd could not be executed
-- Defined-By: systemd
-- Support: https://urldefense.com/v3/__https://www.debian.org/support__;!!A6UyJA!wCf6hAHLa6AftXnrRfqcu9NkyxpVWGHy_xO0Bxz2lPUzny2fOmjNxxkOFmN4WsBnk9u5yxTvRxGj$
--
-- The process /lib/systemd/systemd could not be executed and failed.
--
-- The error number returned by this process is ERRNO.

                After rebooting the VMs things are back to normal, at least for now.
                Any advice on why VRs behave like that and why PAM is complaining ?

Best regards,
Jordan

Mime
View raw message