cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <>
Subject [Cassandra Wiki] Update of "Operations" by JonathanEllis
Date Mon, 05 Apr 2010 22:42:08 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for change notification.

The "Operations" page has been changed by JonathanEllis.
The comment on this change is: move Streaming to its own page, and link it in Boostrap and
Move sections..


   1. As a safety measure, Cassandra does not automatically remove data from nodes that "lose"
part of their Token Range to a newly added node.  Run "nodetool cleanup" on the source node(s)
(neighboring nodes that shared the same subrange) when you are satisfied the new node is up
and working. If you do not do this the old data will still be counted against the load on
that node and future bootstrap attempts at choosing a location will be thrown off.
   1. When bootstrapping a new node, existing nodes have to divide the key space before beginning
replication.  This can take awhile, so be patient.
   1. During bootstrap, a node will drop the Thrift port and will not be accessible from `nodetool`.
+  1. Bootstrap can take many hours when a lot of data is involved.  See [[Streaming]] for
how to monitor progress.
  Cassandra is smart enough to transfer data from the nearest source node(s), if your !EndpointSnitch
is configured correctly.  So, the new node doesn't need to be in the same datacenter as the
primary replica for the Range it is bootstrapping into, as long as another replica is in the
datacenter with the new one.
@@ -79, +80 @@

  === Moving nodes ===
  `nodetool move`: move the target node to to a given Token. Moving is essentially a convenience
over decommission + bootstrap.
+ As with bootstrap, see [[Streaming]] for how to monitor progress.
  === Load balancing ===
  `nodetool loadbalance`: also essentially a convenience over decommission + bootstrap, only
instead of telling the target node where to move on the ring it will choose its location based
on the same heuristic as Token selection on bootstrap.
@@ -177, +180 @@

  FLUSH-WRITER-POOL                 0         0            218
  HINTED-HANDOFF-POOL               0         0            154
- === Streaming ===
- Monitoring the status of streaming on both origination and destination nodes can be found
under the `org.apache.cassandra.streaming.StreamingService` MBean.
- The `Status` attribute gives an easy indication of what a node is doing with respect to
streaming.  During the bulk of a transfer the sending node will report a status of `"Waiting
for transfer to $some_node to complete."`  The receiving node will report `"Receiving stream"`
while receiving stream data.  The `StreamDestinations` and `StreamSources` attributes each
contain a list of hosts that the current node is either sending stream data to or receiving
it from.
- The operations `getOutgoingFiles(host)` and `getIncomingFiles(host)` each return a list
of strings describing the status of individual files being streamed to and from a given host.
 Each string follows this format:  `[path to file] [bytes sent/received]/[file size]` If you
think that streaming is taking too long on your cluster, the first thing you should do is
check `StreamSources` or `StreamDestinations` to figure out which hosts are streaming files.
 Use those hosts as inputs to `getOutgoingFiles()` or `getIncomingFiles()` to check on the
status of individual files from the problematic source and destination nodes.  Streaming is
conducted in 32MB chunks, so you should refresh the file status after a few seconds to see
if the sent/received values change.  If they do not change, or change more slowly than you'd
like, something is wrong.  Keep in mind that a source node can only stream a single file at
a time, but a destination node can simultaneously receive several files.

View raw message