cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aleksey Yeschenko (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-11521) Implement streaming for bulk read requests
Date Mon, 11 Apr 2016 17:07:25 GMT


Aleksey Yeschenko commented on CASSANDRA-11521:

[~brianmhess] Does C*-Spark integration use CL.LOCAL_ONE for reads? I know we do use QUORUM
for writes, as a method for overload control.

A small hint on top of regular {{SELECT}} is a decent first step, but there is so much more
we can do, in general, to make streaming faster, if we go for something purpose-built instead
(even if built on top of Native protocol) - with proper support from the driver.

Among other things, the protocol is very wasteful for the cases where you stream all the data,
especially if you have big partitions and a few clustering columns. While clustering column
repetition as part of cell names is now fully gone from sstables and in-memory representation,
in the protocol itself, with each row, we both repeat all the clustering columns - even if
many rows share them - and the partition key columns. Could get rid of it, and all related
redundant serialisation, if not building on top of ResultSet.

Secondly, it's not common at all to multiplex a single session between transactional and analytical
workloads. So a single Spark java driver session is going to only be dealing with streaming
itself (maybe even only single stream at a time?). We could add a new command ({{STREAM}}),
with query and, say, throughput limit, or maximum # of unacknowledged rows/bytes, and just
server-side push as much as we can without violating the limits. The stream would be cancellable.

Also, ideally, once we switch to the user-space page cache, these queries should not be polluting

> Implement streaming for bulk read requests
> ------------------------------------------
>                 Key: CASSANDRA-11521
>                 URL:
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Local Write-Read Paths
>            Reporter: Stefania
>            Assignee: Stefania
>             Fix For: 3.x
> Allow clients to stream data from a C* host, bypassing the coordination layer and eliminating
the need to query individual pages one by one.

This message was sent by Atlassian JIRA

View raw message