calcite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julian Hyde <jh...@apache.org>
Subject Re: About Stream SQL
Date Thu, 04 Feb 2016 08:35:53 GMT
I totally agree with you. (Sorry for the delayed response; this week has been very busy.)

There is a tendency of vendors (and projects) to think that their technology is unique, and
superior to everyone else’s, and want to showcase it in their dialect of SQL. That is natural,
and it’s OK, since it makes them strive to make their technology better.

However, they have to remember that the end users don’t want something unique, they want
something that solves their problem. They would like something that is standards compliant
so that it is easy to learn, easy to hire developers for, and — if the worst comes to the
worst — easy to migrate to a compatible competing technology.

I know the developers at Storm and Flink (and Samza too) and they understand the importance
of collaborating on a standard.

I have been trying to play a dual role: supplying the parser and planner for streaming SQL,
and also to facilitate the creation of a standard language and semantics of streaming SQL.
For the latter, see Streaming page on Calcite’s web site[1]. On that page, I intend to illustrate
all of the main patterns of streaming queries, give them names (e.g. “Tumbling windows”),
and show how those translate into streaming SQL.

Also, it would be useful to create a reference implementation of streaming SQL in Calcite
so that you can validate and run queries. The performance, scalability and reliability will
not be the same as if you ran Storm, Flink or Samza, but at least you can see what the semantics
should be.

I believe that most, if not all, of the examples that the projects are coming up with can
be translated into SQL. It will be challenging, because we want to preserve the semantics
of SQL, allow streaming SQL to interoperate with traditional relations, and also retain the
general look and feel of SQL. (For example, I fought quite hard[2] recently for the principle
that GROUP BY defines a partition (in the set-theory sense)[3] and therefore could not be
used to represent a tumbling window, until I remembered that GROUPING SETS already allows
each input row to appear in more than one output sub-total.)

What can you, the users, do? Get involved in the discussion about what you want in the language.
Encourage the projects to bring their proposed SQL features into this forum for discussion,
and add to the list of patterns and examples on the Streaming page. As in any standards process,
the users help to keep the vendors focused.

I’ll be talking about streaming SQL, planning, and standardization at the Samza meetup in
2 weeks[4], so if any of you are in the Bay Area, please stop by.

Julian

[1] http://calcite.apache.org/docs/stream.html

[2] http://mail-archives.apache.org/mod_mbox/calcite-dev/201506.mbox/%3CCAPSgeETbowxM2TRX0RFxQ_tEAPk2uM=hE0aryWinBtovGwbddQ@mail.gmail.com%3E

[3] https://en.wikipedia.org/wiki/Partition_of_a_set

[4] http://www.meetup.com/Bay-Area-Samza-Meetup/events/228430492/

> On Jan 29, 2016, at 10:29 PM, Wanglan (Lan) <lan.wanglan@huawei.com> wrote:
> 
> Hi to all,
> 
> I am from Huawei and am focusing on data stream processing.
> Recently I noticed that both in Storm community and Flink community there are endeavors
to user Calcite as SQL parser to enable Storm/Flink to support SQL. They both want to supplemented
or clarify Streaming SQL of calcite, especially the definition of windows.
> I am considering if both communities working on designing Stream SQL syntax separately,
there would come out two different syntaxes which represent the same use case.
> Therefore, I am wondering if it is possible to unify such work, i.e. design and compliment
the calcite Streaming SQL to enrich window definition so that both storm and flink can reuse
the calcite(Streaming SQL) as their SQL parser for streaming cases with little change.
> What do you think about this idea?
> 


Mime
View raw message