kudu-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From wdberke...@apache.org
Subject [1/4] kudu git commit: [docs] Add brief instructions on decommissioning tablet servers
Date Tue, 09 Oct 2018 17:06:59 GMT
Repository: kudu
Updated Branches:
  refs/heads/master 89206ed91 -> 9ec8d28db

[docs] Add brief instructions on decommissioning tablet servers

Change-Id: I4e9ab976390ab6c0d5b8db0da00b27dc031037e5
Reviewed-on: http://gerrit.cloudera.org:8080/11618
Tested-by: Will Berkeley <wdberkeley@gmail.com>
Reviewed-by: Andrew Wong <awong@cloudera.com>
Tested-by: Kudu Jenkins

Project: http://git-wip-us.apache.org/repos/asf/kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/891a8e3b
Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/891a8e3b
Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/891a8e3b

Branch: refs/heads/master
Commit: 891a8e3beca417007933bcae7c08067e4a3bdfb8
Parents: 89206ed
Author: Will Berkeley <wdberkeley@gmail.org>
Authored: Mon Oct 8 14:20:45 2018 -0700
Committer: Will Berkeley <wdberkeley@gmail.com>
Committed: Tue Oct 9 14:54:01 2018 +0000

 docs/administration.adoc | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/docs/administration.adoc b/docs/administration.adoc
index b176f58..1c3ab20 100644
--- a/docs/administration.adoc
+++ b/docs/administration.adoc
@@ -1249,3 +1249,30 @@ table, "RF" stands for "replication factor".
 If the rebalancer is running against a cluster where rebalancing replication
 factor one tables is not supported, it will rebalance all the other tables
 and the cluster as if those singly-replicated tables did not exist.
+.Decommissioning or Permanently Removing a Tablet Server From a Cluster
+Kudu does not currently have an automated way to remove a tablet server from
+a cluster permanently. Instead, use the following steps:
+. Ensure the cluster is in good health using `ksck`. See <<ksck>>.
+. If the tablet server contains any replicas of tables with replication factor
+  1, these replicas must be manually moved off the tablet server prior to
+  shutting it down. The `kudu tablet change_config move_replica` tool can be
+  used for this.
+. Shut down the tablet server. After
+  `-follower_unavailable_considered_failed_sec`, which defaults to 5 minutes,
+  Kudu will begin to re-replicate the tablet server's replicas to other servers.
+  Wait until the process is finished. Progress can be monitored using `ksck`.
+. Once all the copies are complete, `ksck` will continue to report the tablet
+  server as unavailable. The cluster will otherwise operate fine without the
+  tablet server. To completely remove it from the cluster so `ksck` shows the
+  cluster as completely healthy, restart the masters. In the case of a single
+  master, this will cause cluster downtime. With multimaster, restart the
+  masters in sequence to avoid cluster downtime.
+WARNING: Do not shut down multiple tablet servers at once. To remove multiple
+tablet servers from the cluster, follow the above instructions for each tablet
+server, ensuring that the previous tablet server is removed from the cluster and
+`ksck` is healthy before shutting down the next.

View raw message