calcite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Mior <>
Subject Re: "Standardizing" streaming SQL
Date Sat, 10 Feb 2018 14:42:00 GMT
This looks like a great start! Not sure I'll be able to contribute much for
the time being, but this sounds like a good plan and I'll be interested to
follow future developments.

Michael Mior

2018-02-10 2:44 GMT-05:00 Julian Hyde <>:

> As you know, I am a big believer that SQL is a great language not just
> for data at rest, but also data in flight. Calcite has extensions to
> SQL for streaming queries, and a reference implementation, and I have
> spoken about streaming SQL at several conferences over the years.
> Several projects, including Apex, Beam, Flink and Storm, have
> leveraged Calcite to add streaming SQL support.
> But SQL becomes truly valuable when people can assume that its
> features exist in every product in the market. It makes their
> applications portable, and it makes it easier for them to apply their
> skills to new products. So, it is important that streaming SQL becomes
> standard.
> The official SQL standard is written by ANSI/ISO and is dominated by
> large vendors, and I don't even know how to engage with them. But the
> interesting work on streaming systems is happening in Apache, so it
> makes sense to start closer to home. After conversations with folks
> from a few projects - some of those mentioned above, plus Kafka and
> Spark - a group of us have concluded that the next step is to develop
> a standard using the Apache way - by open discussion, making decisions
> by consensus, by iteratively developing and reviewing code, and by
> releasing that code periodically.
> How can you develop a standard by writing software? The idea is to
> develop a Test Compatibility Kit (TCK), a suite of tests that embodies
> the standard. If you are the author of a streaming engine, you can
> download the TCK and run it against your engine, and the test tells
> you whether you engine is compliant.
> The TCK is developed by committers from the participating engines. If
> we want to add a new feature to streaming SQL, say stream-to-stream
> joins, then we would add tests to the TCK, and achieve consensus about
> the SQL syntax and the expected behavior - which rows will be emitted,
> at what times, and in what order, for given inputs to a query.
> Our plan is to use this list - dev@calcite - for discussions, and use
> a github project (under Apache license but outside the ASF) for code
> and issues.
> Kenn Knowles has already created the project:
> Next steps are to design a language for the tests, figure out which
> features we would like to test in our first release, and start writing
> the first few tests.
> Here are the basic features we might test in the first release:
> * GROUP BY with Hop and Tumble windowing functions
> * Query a table (no streams involved)
> * JOIN a stream to a stream
> * JOIN a stream to a static table
> Here are more advanced features we might test in later releases:
> * GROUP BY with Session windowing function
> * Arbitrary stateful processing
> * Injected UDFs
> * Windowed aggregate functions (OVER)
> * JOIN a stream to time-varying table
> * Mechanism to emit early results (EMIT)
> All of the above are subject to discussion & change.
> Here is my sketch of a test:
> test "filter-equals" {
>   decls {
>     CREATE Orders (TIMESTAMP rowtime, INT orderId, VARCHAR product);
>   }
>   queries {
>     Q1: SELECT STREAM * FROM Orders WHERE product = ‘soda’
>   }
>   input {
>     Orders (‘00:01’, 10, ‘beer’)
>     Orders (‘00:03’, 11, ‘soda’)
>   }
>   output {
>     Q1 (‘00:03’, 11, ‘soda’)
>   }
> }
> Again, subject to change. Especially, don't worry too much about the
> syntax; that will certainly change. But it shows what pieces of
> information are necessary to define a test without making any
> reference to the engine that will execute that test.
> If you're interested in participating in this project, you are most
> welcome. Please raise your hand by joining the discussion on this
> list. Also, start logging cases in the github project, and start
> writing pull requests.
> Julian

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message