cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From " Brian Hess (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-8234) CTAS (CREATE TABLE AS SELECT)
Date Thu, 06 Aug 2015 17:44:12 GMT


 Brian Hess commented on CASSANDRA-8234:

A few questions:
1. For OSS C* (as opposed to DSE), will the Spark Master be visible to users other than C*
itself?  As in, will Cassandra be the only process/user able to execute Spark jobs?  Or will
users be able to submit jobs, start the SparkSQL thriftserver, etc?
2. How current will Spark be kept with Cassandra?  Will there be any guidance (or guarantees)
about how stale the Spark is that is being included?  Or how often Cassandra will be upgraded
to incorporate a new Spark?  Same for the OSS spark-cassandra-connector.
3. Will there be a load-sharing system in place so that multiple CTAS queries can run simultaneously
(Spark in stand-alone mode will by default "reserve" all available cores and "lock out" another
spark job)?
4. Will there be some "sandboxing" of Spark so that C* and Spark play nicely (with respect
to RAM, CPU, etc)?
5. My assumption is that "CREATE TABLE b(x INT, y INT, z INT) AS SELECT x, y, z FROM a WITH
PRIMARY KEY ((x), y)" [syntax for illustrative purposes only] will be an asynchronous operation.
 That is, it will return "success" to the client, but the operation will be a background operation.
 First, is that correct?  If so, I think there will have to be a status like in MVs and 2Is,
correct?  If not, what will do about timing out of this query?

> -----------------------------
>                 Key: CASSANDRA-8234
>                 URL:
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Tools
>            Reporter: Robin Schumacher
>             Fix For: 3.x
> Continuous request from users is the ability to do CREATE TABLE AS SELECT.  The simplest
form would be copying the entire table.  More advanced would allow specifying thes column
and UDF to call as well as filtering rows out in WHERE.
> More advanced still would be to get all the way to allowing JOIN, for which we probably
want to integrate Spark.

This message was sent by Atlassian JIRA

View raw message