cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Brown (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-12229) Move streaming to non-blocking IO and netty (streaming 2.1)
Date Mon, 13 Mar 2017 13:20:41 GMT


Jason Brown commented on CASSANDRA-12229:

This ticket builds upon the netty integration work done in CASSANDRA-8457, and is a requirement
for commiting that ticket. In brief, there are several major components of this patch:

- allow for parallel streaming of files in a {{StreamSession}}. See {{NettyStreamingMessageSender}}
in the patch.
- altered the session initialization/preparation phases by adding a few additional messages.
- All streaming-related messages, barring the actual file transfer ({{OutgoingFileMessage}}),
are communicated over the standard internode messaging connections, and not the stream connection(s).
Stream connections are now only responsible for the file transfer. By moving the stream messages
off the stream connections, it allows the session management to be treated more independently
and thus support parallel file transer.
- the actual file send/receive logic, to work optimally within a netty context.

I've updated [~yukim]'s original (and excellent) documentation on {{StreamSession}} to reflect
the new messaging interactions, and have added extensive package-level documentation at {{org.apache.cassandra.streaming.async}}
which details the file transfer aspects.

Initial (rough) testing, with parallel sstable transfers, shows about a 15-20% reduction in
streaming latencies.

Current known issues which need to be addressed:
- error handling and session management on failure
- some broken dtests
- some broken {{BulkWriter}} interactions (I need to ferret out what's going on with these)

> Move streaming to non-blocking IO and netty (streaming 2.1)
> -----------------------------------------------------------
>                 Key: CASSANDRA-12229
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Streaming and Messaging
>            Reporter: Jason Brown
>            Assignee: Jason Brown
>             Fix For: 4.0
> As followup work to CASSANDRA-8457, we need to move streaming to use netty.
> Streaming 2.0 (CASSANDRA-5286) brought many good improvements to how files are transferred
between nodes in a cluster. However, the low-level details of the current streaming implementation
does not line up nicely with a non-blocking model, so I think this is a good time to review
some of those details and add in additional goodness. The current implementation assumes a
sequential or "single threaded" approach to the sending of stream messages as well as the
transfer of files. In short, after several iterative prototypes, I propose the following:
> 1) use a single bi-diredtional connection (instead of requiring to two sockets &
two threads)
> 2) send the "non-file" {{StreamMessage}} s (basically anything not {{OutboundFileMessage}})
via the normal internode messaging. This will require a slight bit more management of the
session (the ability to look up a {{StreamSession}} from a static function on {{StreamManager}},
but we have have most of the pieces we need for this already.
> 3) switch to a non-blocking IO model (facilitated via netty)
> 4) Allow files to be streamed in parallel (CASSANDRA-4663) - this should just be a thing
> 5) If the entire sstable is to streamed, in addition to the DATA component, transfer
all the components of the sstable (primary index, bloom filter, stats, and so on). This way
we can avoid the CPU and GC pressure from deserializing the stream into objects. File streaming
then amounts to a block-level transfer.
> Note: The progress/results of CASSANDRA-11303 will need to be reflected here, as well.

This message was sent by Atlassian JIRA

View raw message