cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aleksey Yeschenko (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-11521) Implement streaming for bulk read requests
Date Mon, 11 Apr 2016 17:07:25 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-11521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15235496#comment-15235496
] 

Aleksey Yeschenko commented on CASSANDRA-11521:
-----------------------------------------------

[~brianmhess] Does C*-Spark integration use CL.LOCAL_ONE for reads? I know we do use QUORUM
for writes, as a method for overload control.

A small hint on top of regular {{SELECT}} is a decent first step, but there is so much more
we can do, in general, to make streaming faster, if we go for something purpose-built instead
(even if built on top of Native protocol) - with proper support from the driver.

Among other things, the protocol is very wasteful for the cases where you stream all the data,
especially if you have big partitions and a few clustering columns. While clustering column
repetition as part of cell names is now fully gone from sstables and in-memory representation,
in the protocol itself, with each row, we both repeat all the clustering columns - even if
many rows share them - and the partition key columns. Could get rid of it, and all related
redundant serialisation, if not building on top of ResultSet.

Secondly, it's not common at all to multiplex a single session between transactional and analytical
workloads. So a single Spark java driver session is going to only be dealing with streaming
itself (maybe even only single stream at a time?). We could add a new command ({{STREAM}}),
with query and, say, throughput limit, or maximum # of unacknowledged rows/bytes, and just
server-side push as much as we can without violating the limits. The stream would be cancellable.

Also, ideally, once we switch to the user-space page cache, these queries should not be polluting
it.

> Implement streaming for bulk read requests
> ------------------------------------------
>
>                 Key: CASSANDRA-11521
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11521
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Local Write-Read Paths
>            Reporter: Stefania
>            Assignee: Stefania
>             Fix For: 3.x
>
>
> Allow clients to stream data from a C* host, bypassing the coordination layer and eliminating
the need to query individual pages one by one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message