cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benjamin Lerer (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-7622) Implement virtual tables
Date Tue, 24 Apr 2018 15:02:01 GMT


Benjamin Lerer commented on CASSANDRA-7622:

{quote}you can do arithmetic operations, searches and aggregations on them but the type was
Sorry, my comment was misleading. I just wanted to mention the fact that aggregates and arithmetic
operations do not work on {{TEXT}} values.

{quote}What would you think the table schema should look like?{quote}
I asked myself that question a lot. Due to the CQL limitations, I do not think that there
is a perfect solution. In the end, my preferred  schema is:
SYSTEM VIEW sv_table_metrics (keyspace TEXT,
                          table TEXT,
                          memtable_on_heap_size BIGINT,
                          memtable_off_heap_size BIGINT,
                          PRIMARY KEY (keyspace, table));

That approach in the case of the {{table}} and {{keyspace}} metrics result in tables with
a big number of columns (even if we mitigate that fact by using user define types for histograms,
meters and timers) but it allow to easily select different subset of data. You can query based
on {{keyspaces}}, {{tables}}, {{metrics}} and {{metric fields}}. At the same time you can
easily select a specific metric value for a given table in an efficient way.

{quote}I am not fussy about naming. However, using the same terminology does confuse users
as they may expect the same feature set from Cassandra as they got in their relational database.
I would personally avoid it.{quote}

Based on my experience working on CQL tickets and my interaction with users or discussions
with evangelists I came up with 2 conclusions.
# If the feature is the similar to one that they know from the relational world, people prefer
when you use the same name. It is easier for them to recognize it and to understand how it
should be used.
# If the feature has a different behavior that what is used in the relational world you should
be careful and use a different naming or it will backfire.

In this case, there is no real difference between us an the relational world. Due to that,
I think it would be a mistake to not reuse the name.
The {{Virtual Table}} name is in my opinion the really confusing one. It just make me think
to some form of pluggable storage. Coming from the SQL world, it is not the name I would use
in google to figure out how to access system information in Cassandra.

{quote}do you have a design or code that you can share? It would be great if you can post
it. Is there a timeline around when you'll post it?{quote}

At the high level there are some similarities between [~cnlwsu] patch and ours. We have introduced
some {{ReadQuery}} subclasses that delegate calls to {{SystemViews}} and slightly refactored
the CQL layer to allow it to work on top of all {{ReadQuery}} implementations. The advantage
of that approach is that the existing CQL functionalities are automatically supported on top
the {{SystemViews}} and the conditional logic require for adding support for {{SystemViews}}
is much lower.

[~cnlwsu] current patch does not support some range queries or multi-partition queries for
example. It will fire an {{[Invalid query] message="IN restrictions are not supported on indexed
columns"}} for multi-partition queries. We avoided that kind of risks/problems with our approach.

That reduced the logic of our {{SystemView}} implementation to just fetching the requesting
data or updating them.

Our current code has been designed for DSE. So I need to modify it to make it work on top
of C*. As I am also quite busy with some other tasks, it would probably take 2 weeks before
I finish the port.     


> Implement virtual tables
> ------------------------
>                 Key: CASSANDRA-7622
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: CQL
>            Reporter: Tupshin Harper
>            Assignee: Chris Lohfink
>            Priority: Major
>             Fix For: 4.x
> There are a variety of reasons to want virtual tables, which would be any table that
would be backed by an API, rather than data explicitly managed and stored as sstables.
> One possible use case would be to expose JMX data through CQL as a resurrection of CASSANDRA-3527.
> Another is a more general framework to implement the ability to expose yaml configuration
information. So it would be an alternate approach to CASSANDRA-7370.
> A possible implementation would be in terms of CASSANDRA-7443, but I am not presupposing.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message