calcite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edmon Begoli <ebeg...@gmail.com>
Subject Re: "Standardizing" streaming SQL
Date Sat, 10 Mar 2018 21:22:23 GMT
Julian,

As you know, I joined the SQL standards group to represent Calcite, and
whatever project we can through this interaction.

With that in mind, are there any particular stream features supported by
Calcite that you, or others present here, would suggest I advocate for
inclusion or to check if they are included.

Please send as detailed reference as you can. This is a pretty formal body,
and I want to give them something technically sound.

Thank you,
Edmon

On Saturday, February 10, 2018, Julian Hyde <jhyde@apache.org> wrote:

> As you know, I am a big believer that SQL is a great language not just
> for data at rest, but also data in flight. Calcite has extensions to
> SQL for streaming queries, and a reference implementation, and I have
> spoken about streaming SQL at several conferences over the years.
> Several projects, including Apex, Beam, Flink and Storm, have
> leveraged Calcite to add streaming SQL support.
>
> But SQL becomes truly valuable when people can assume that its
> features exist in every product in the market. It makes their
> applications portable, and it makes it easier for them to apply their
> skills to new products. So, it is important that streaming SQL becomes
> standard.
>
> The official SQL standard is written by ANSI/ISO and is dominated by
> large vendors, and I don't even know how to engage with them. But the
> interesting work on streaming systems is happening in Apache, so it
> makes sense to start closer to home. After conversations with folks
> from a few projects - some of those mentioned above, plus Kafka and
> Spark - a group of us have concluded that the next step is to develop
> a standard using the Apache way - by open discussion, making decisions
> by consensus, by iteratively developing and reviewing code, and by
> releasing that code periodically.
>
> How can you develop a standard by writing software? The idea is to
> develop a Test Compatibility Kit (TCK), a suite of tests that embodies
> the standard. If you are the author of a streaming engine, you can
> download the TCK and run it against your engine, and the test tells
> you whether you engine is compliant.
>
> The TCK is developed by committers from the participating engines. If
> we want to add a new feature to streaming SQL, say stream-to-stream
> joins, then we would add tests to the TCK, and achieve consensus about
> the SQL syntax and the expected behavior - which rows will be emitted,
> at what times, and in what order, for given inputs to a query.
>
> Our plan is to use this list - dev@calcite - for discussions, and use
> a github project (under Apache license but outside the ASF) for code
> and issues.
>
> Kenn Knowles has already created the project:
> https://github.com/Stream-SQL-TCK/Stream-SQL-TCK
>
> Next steps are to design a language for the tests, figure out which
> features we would like to test in our first release, and start writing
> the first few tests.
>
> Here are the basic features we might test in the first release:
> * SELECT ... FROM
> * WHERE
> * GROUP BY with Hop and Tumble windowing functions
> * UNION ALL
> * Query a table (no streams involved)
> * JOIN a stream to a stream
> * JOIN a stream to a static table
>
> Here are more advanced features we might test in later releases:
> * GROUP BY with Session windowing function
> * MATCH_RECOGNIZE
> * Arbitrary stateful processing
> * Injected UDFs
> * Windowed aggregate functions (OVER)
> * JOIN a stream to time-varying table
> * Mechanism to emit early results (EMIT)
>
> All of the above are subject to discussion & change.
>
> Here is my sketch of a test:
>
> test "filter-equals" {
>   decls {
>     CREATE Orders (TIMESTAMP rowtime, INT orderId, VARCHAR product);
>   }
>   queries {
>     Q1: SELECT STREAM * FROM Orders WHERE product = ‘soda’
>   }
>   input {
>     Orders (‘00:01’, 10, ‘beer’)
>     Orders (‘00:03’, 11, ‘soda’)
>   }
>   output {
>     Q1 (‘00:03’, 11, ‘soda’)
>   }
> }
>
> Again, subject to change. Especially, don't worry too much about the
> syntax; that will certainly change. But it shows what pieces of
> information are necessary to define a test without making any
> reference to the engine that will execute that test.
>
> If you're interested in participating in this project, you are most
> welcome. Please raise your hand by joining the discussion on this
> list. Also, start logging cases in the github project, and start
> writing pull requests.
>
> Julian
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message