calcite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edmon Begoli <ebeg...@gmail.com>
Subject Re: "Standardizing" streaming SQL
Date Sat, 10 Feb 2018 07:53:02 GMT
Julian,

I am certainly interested in participating in the discussion, and in the
initiative -- time permits.
In my environment, streaming data from large environmental sensor networks
is a common challenge.

Riccardo Tomassini and I just this week discussed the research interests
and work in streams reasoning.

In terms of standards and influencing those - I am participating in some
data standards committees, so this participation might be a way for us
(Calcite and related) to have a voice in terms of contributions or
influences of the streaming standards.

You are right that it is mostly big vendors, but I think there is a room
for us to have a say.

Thank you for the initiative,
Edmon

On Sat, Feb 10, 2018 at 2:44 AM, Julian Hyde <jhyde@apache.org> wrote:

> As you know, I am a big believer that SQL is a great language not just
> for data at rest, but also data in flight. Calcite has extensions to
> SQL for streaming queries, and a reference implementation, and I have
> spoken about streaming SQL at several conferences over the years.
> Several projects, including Apex, Beam, Flink and Storm, have
> leveraged Calcite to add streaming SQL support.
>
> But SQL becomes truly valuable when people can assume that its
> features exist in every product in the market. It makes their
> applications portable, and it makes it easier for them to apply their
> skills to new products. So, it is important that streaming SQL becomes
> standard.
>
> The official SQL standard is written by ANSI/ISO and is dominated by
> large vendors, and I don't even know how to engage with them. But the
> interesting work on streaming systems is happening in Apache, so it
> makes sense to start closer to home. After conversations with folks
> from a few projects - some of those mentioned above, plus Kafka and
> Spark - a group of us have concluded that the next step is to develop
> a standard using the Apache way - by open discussion, making decisions
> by consensus, by iteratively developing and reviewing code, and by
> releasing that code periodically.
>
> How can you develop a standard by writing software? The idea is to
> develop a Test Compatibility Kit (TCK), a suite of tests that embodies
> the standard. If you are the author of a streaming engine, you can
> download the TCK and run it against your engine, and the test tells
> you whether you engine is compliant.
>
> The TCK is developed by committers from the participating engines. If
> we want to add a new feature to streaming SQL, say stream-to-stream
> joins, then we would add tests to the TCK, and achieve consensus about
> the SQL syntax and the expected behavior - which rows will be emitted,
> at what times, and in what order, for given inputs to a query.
>
> Our plan is to use this list - dev@calcite - for discussions, and use
> a github project (under Apache license but outside the ASF) for code
> and issues.
>
> Kenn Knowles has already created the project:
> https://github.com/Stream-SQL-TCK/Stream-SQL-TCK
>
> Next steps are to design a language for the tests, figure out which
> features we would like to test in our first release, and start writing
> the first few tests.
>
> Here are the basic features we might test in the first release:
> * SELECT ... FROM
> * WHERE
> * GROUP BY with Hop and Tumble windowing functions
> * UNION ALL
> * Query a table (no streams involved)
> * JOIN a stream to a stream
> * JOIN a stream to a static table
>
> Here are more advanced features we might test in later releases:
> * GROUP BY with Session windowing function
> * MATCH_RECOGNIZE
> * Arbitrary stateful processing
> * Injected UDFs
> * Windowed aggregate functions (OVER)
> * JOIN a stream to time-varying table
> * Mechanism to emit early results (EMIT)
>
> All of the above are subject to discussion & change.
>
> Here is my sketch of a test:
>
> test "filter-equals" {
>   decls {
>     CREATE Orders (TIMESTAMP rowtime, INT orderId, VARCHAR product);
>   }
>   queries {
>     Q1: SELECT STREAM * FROM Orders WHERE product = ‘soda’
>   }
>   input {
>     Orders (‘00:01’, 10, ‘beer’)
>     Orders (‘00:03’, 11, ‘soda’)
>   }
>   output {
>     Q1 (‘00:03’, 11, ‘soda’)
>   }
> }
>
> Again, subject to change. Especially, don't worry too much about the
> syntax; that will certainly change. But it shows what pieces of
> information are necessary to define a test without making any
> reference to the engine that will execute that test.
>
> If you're interested in participating in this project, you are most
> welcome. Please raise your hand by joining the discussion on this
> list. Also, start logging cases in the github project, and start
> writing pull requests.
>
> Julian
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message