mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [mesos] kamaradclimber commented on a change in pull request #388: Fixed a bug where the cgroup task killer leaves the cgroup frozen.
Date Mon, 24 May 2021 13:39:40 GMT

kamaradclimber commented on a change in pull request #388:
URL: https://github.com/apache/mesos/pull/388#discussion_r637951921



##########
File path: src/linux/cgroups.cpp
##########
@@ -1403,9 +1403,15 @@ class TasksKiller : public Process<TasksKiller>
 protected:
   void initialize() override
   {
-    // Stop when no one cares.
-    promise.future().onDiscard(lambda::bind(
-        static_cast<void (*)(const UPID&, bool)>(terminate), self(), true));
+    // We don't want to stop immediately upon discard, because
+    // it could leave the cgroup frozen which means that processes
+    // are stuck in uninterrutible sleep (D state), which is quite bad.
+    // So upon discard we still do our best and keep trying to
+    // kill the cgroup for up to FREEZE_RETRY_INTERVAL which should be
+    // a reasonable upper bound.
+    promise.future().onDiscard([this]() {
+      delay(FREEZE_RETRY_INTERVAL, self(), &Self::selfTerminate);

Review comment:
       Would there be a way to avoid this extra delay if we detect the cgroup is not frozen
at that time?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



Mime
View raw message