kudu-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From t...@apache.org
Subject incubator-kudu git commit: Only release memory to OS when overhead is unacceptable
Date Tue, 12 Jan 2016 03:51:24 GMT
Repository: incubator-kudu
Updated Branches:
  refs/heads/master b14ea5165 -> 815b02dba


Only release memory to OS when overhead is unacceptable

Previously, we would call MallocExtension::ReleaseFreeMemory() after
every 128MB of memory released by MemTrackers. This had two issues:

1) it releases _all_ free memory from the tcmalloc heap, which causes
unnecessary minor page faults later

2) it holds the tcmalloc page heap lock, preventing other threads from
accessing central free lists[1]

Looking at a trace of an Impala query that hit OS cache but missed the
Kudu cache, I saw periodic blips where all scan RPCs would block for
20ms several time per second. After adding a trace span around the
tcmalloc call, it was clear that this was the culprit. This workload
triggers a lot of allocations and frees due to block cache churn.

To fix, this patch introduces a new flag for the acceptable ratio of
free memory compared to in-use memory on the heap, defaulting to 10%.
The MemTracker now only releases memory back if this threshold is
exceeded, and does so in small increments of 1MB at a time, such that
the tcmalloc-internal lock is never held for too long.

To test the effects of the patch, I looked at percentiles of the Scan
RPC on this workload before and after the patch:

Before: {'max': 14527, 'p999': 8895, 'p50': 3279, 'p95': 5759, 'p99': 6783}
After:  {'max': 14527, 'p999': 8447, 'p50': 3167, 'p95': 5471, 'p99': 6399}

It seems to have improved the tail latencies by about 5%. Other
workloads may improve more or less.

[1] https://github.com/gperftools/gperftools/issues/754

Change-Id: I184c9b7c64a57d4cd178fe7cfb2c548e323edecb
Reviewed-on: http://gerrit.cloudera.org:8080/1711
Reviewed-by: Adar Dembo <adar@cloudera.com>
Tested-by: Internal Jenkins


Project: http://git-wip-us.apache.org/repos/asf/incubator-kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-kudu/commit/815b02db
Tree: http://git-wip-us.apache.org/repos/asf/incubator-kudu/tree/815b02db
Diff: http://git-wip-us.apache.org/repos/asf/incubator-kudu/diff/815b02db

Branch: refs/heads/master
Commit: 815b02dbac8e2075c5de47931e1838531e9e9e35
Parents: b14ea51
Author: Todd Lipcon <todd@cloudera.com>
Authored: Tue Jan 5 18:21:07 2016 -0800
Committer: Todd Lipcon <todd@apache.org>
Committed: Tue Jan 12 03:40:06 2016 +0000

----------------------------------------------------------------------
 src/kudu/util/mem_tracker.cc | 53 ++++++++++++++++++++++++++++++++-------
 1 file changed, 44 insertions(+), 9 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-kudu/blob/815b02db/src/kudu/util/mem_tracker.cc
----------------------------------------------------------------------
diff --git a/src/kudu/util/mem_tracker.cc b/src/kudu/util/mem_tracker.cc
index 4895269..9206a2b 100644
--- a/src/kudu/util/mem_tracker.cc
+++ b/src/kudu/util/mem_tracker.cc
@@ -30,6 +30,7 @@
 #include "kudu/gutil/strings/human_readable.h"
 #include "kudu/gutil/strings/substitute.h"
 #include "kudu/util/debug-util.h"
+#include "kudu/util/debug/trace_event.h"
 #include "kudu/util/env.h"
 #include "kudu/util/flag_tags.h"
 #include "kudu/util/mutex.h"
@@ -55,6 +56,13 @@ DEFINE_int32(memory_limit_warn_threshold_percentage, 98,
              "consume before WARNING level messages are periodically logged.");
 TAG_FLAG(memory_limit_warn_threshold_percentage, advanced);
 
+#ifdef TCMALLOC_ENABLED
+DEFINE_int32(tcmalloc_max_free_bytes_percentage, 10,
+             "Maximum percentage of the RSS that tcmalloc is allowed to use for "
+             "reserved but unallocated memory.");
+TAG_FLAG(tcmalloc_max_free_bytes_percentage, advanced);
+#endif
+
 namespace kudu {
 
 // NOTE: this class has been adapted from Impala, so the code style varies
@@ -77,8 +85,8 @@ static GoogleOnceType root_tracker_once = GOOGLE_ONCE_INIT;
 // is greater than GC_RELEASE_SIZE, this will trigger a tcmalloc gc.
 static Atomic64 released_memory_since_gc;
 
-// Validate that soft limit is a percentage.
-static bool ValidateSoftLimit(const char* flagname, int value) {
+// Validate that various flags are percentages.
+static bool ValidatePercentage(const char* flagname, int value) {
   if (value >= 0 && value <= 100) {
     return true;
   }
@@ -86,18 +94,26 @@ static bool ValidateSoftLimit(const char* flagname, int value) {
                            flagname, value);
   return false;
 }
-static bool dummy = google::RegisterFlagValidator(
-    &FLAGS_memory_limit_soft_percentage, &ValidateSoftLimit);
+static bool dummy[] = {
+  google::RegisterFlagValidator(&FLAGS_memory_limit_soft_percentage, &ValidatePercentage),
+  google::RegisterFlagValidator(&FLAGS_memory_limit_warn_threshold_percentage, &ValidatePercentage)
+#ifdef TCMALLOC_ENABLED
+  ,google::RegisterFlagValidator(&FLAGS_tcmalloc_max_free_bytes_percentage, &ValidatePercentage)
+#endif
+};
 
 #ifdef TCMALLOC_ENABLED
-static uint64_t GetTCMallocCurrentAllocatedBytes() {
+static int64_t GetTCMallocProperty(const char* prop) {
   size_t value;
-  if (!MallocExtension::instance()->GetNumericProperty(
-      "generic.current_allocated_bytes", &value)) {
-    LOG(DFATAL) << "Failed to get tcmalloc current allocated bytes";
+  if (!MallocExtension::instance()->GetNumericProperty(prop, &value)) {
+    LOG(DFATAL) << "Failed to get tcmalloc property " << prop;
   }
   return value;
 }
+
+static int64_t GetTCMallocCurrentAllocatedBytes() {
+  return GetTCMallocProperty("generic.current_allocated_bytes");
+}
 #endif
 
 void MemTracker::CreateRootTracker() {
@@ -473,7 +489,26 @@ bool MemTracker::GcMemory(int64_t max_consumption) {
 void MemTracker::GcTcmalloc() {
 #ifdef TCMALLOC_ENABLED
   released_memory_since_gc = 0;
-  MallocExtension::instance()->ReleaseFreeMemory();
+  TRACE_EVENT0("process", "MemTracker::GcTcmalloc");
+
+  // Number of bytes in the 'NORMAL' free list (i.e reserved by tcmalloc but
+  // not in use).
+  int64_t bytes_overhead = GetTCMallocProperty("tcmalloc.pageheap_free_bytes");
+  // Bytes allocated by the application.
+  int64_t bytes_used = GetTCMallocCurrentAllocatedBytes();
+
+  int64_t max_overhead = bytes_used * FLAGS_tcmalloc_max_free_bytes_percentage / 100.0;
+  if (bytes_overhead > max_overhead) {
+    int64_t extra = bytes_overhead - max_overhead;
+    while (extra > 0) {
+      // Release 1MB at a time, so that tcmalloc releases its page heap lock
+      // allowing other threads to make progress. This still disrupts the current
+      // thread, but is better than disrupting all.
+      MallocExtension::instance()->ReleaseToSystem(1024 * 1024);
+      extra -= 1024 * 1024;
+    }
+  }
+
 #else
   // Nothing to do if not using tcmalloc.
 #endif


Mime
View raw message