cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <>
Subject [Cassandra Wiki] Update of "Streaming" by JonathanEllis
Date Mon, 05 Apr 2010 22:25:01 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for change notification.

The "Streaming" page has been changed by JonathanEllis.
The comment on this change is: describe streaming steps + anticompaction.


New page:
When data needs to be moved from one node (the source) to another (the destination), the following
steps occur:

 1. The destination sends a request to the source with the data ranges it desires
 1. The source copies the data in those ranges to sstable files in preparation for streaming.
 This is called anti-compaction (because compaction merges multiple sstable files into one,
and this does the opposite).
 1. The source sends the list of files to be streamed to the destination, followed by the

Monitoring the status of streaming on both source and destination nodes can be found under
the `org.apache.cassandra.streaming.StreamingService` MBean.  The `Status` attribute gives
an easy indication of what a node is doing with respect to streaming.

Step 2 is what takes the most time on most systems. The destination will be idle during this
stage; to monitor anticompaction progress,  you should check the `Compaction` mbean on the

Once step 3 begins actual data transfer, the sending node will report a status of `"Waiting
for transfer to $some_node to complete."`  The receiving node will report `"Receiving stream"`
while receiving stream data.  The `StreamDestinations` and `StreamSources` attributes each
contain a list of hosts that the current node is either sending stream data to or receiving
it from.

The operations `getOutgoingFiles(host)` and `getIncomingFiles(host)` each return a list of
strings describing the status of individual files being streamed to and from a given host.
 Each string follows this format:  `[path to file] [bytes sent/received]/[file size]` If you
think that streaming is taking too long on your cluster, the first thing you should do is
check `StreamSources` or `StreamDestinations` to figure out which hosts are streaming files.
 Use those hosts as inputs to `getOutgoingFiles()` or `getIncomingFiles()` to check on the
status of individual files from the problematic source and destination nodes.  Streaming is
conducted in 32MB chunks, so you should refresh the file status after a few seconds to see
if the sent/received values change.  If they do not change, or change more slowly than you'd
like, something is wrong.  Keep in mind that a source node can only stream a single file at
a time, but a destination node can simultaneously receive several files.

View raw message