calcite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julian Hyde <jh...@apache.org>
Subject Re: [DISCUSS] Temporal Feature in Calcite
Date Wed, 02 Aug 2017 18:34:59 GMT
I do think that the best way to write the query is

SELECT  *
   FROM Orders AS o
   JOIN ProductPrices FOR SYSTEM_TIME AS OF o.OrderType AS p
     ON o.productId = p.productId

But do we agree that this goes beyond the standard? It breaches "c) QSTPS shall not contain
a <column reference> or an <SQL parameter reference>.” because “o.OrderType”
is a column reference.

Julian

> On Aug 1, 2017, at 1:33 AM, 伍翀(云邪) <wuchong.wc@alibaba-inc.com> wrote:
> 
> Hi Julian,
> 
> We are interested in both and I have logged CALCITE-1912 for supporting “FOR SYSTEM_TIME
AS OF” in regular queries.
> 
> Regarding to (b), you mean it can be covered by standard. I agree, the query I posted
in the previous mail can be rewrote as a subquery:
> 
> SELECT  *
>    FROM Orders AS o
>    JOIN LATERAL (SELECT * FROM ProductPrices WHERE sysStart <= O.orderTime AND sysEnd
> O.orderTime) AS P
>      ON o.productId = p.productId
> 
> But subquery is too complex for users, the period condition is trivial for users. Using
the “FOR SYSTEM_TIME AS OF”  is much simpler. Yes, you are right that the standard says
it can’t contain a column reference. We initialize this discuss as we would like to "extend"
the standard to simplify such query. What do you think?
> 
> - Jark Wu 
> 
>> 在 2017年8月1日,上午2:58,Julian Hyde <jhyde@apache.org> 写道:
>> 
>> Can you clarify whether you are interested in (a) streaming queries, (b) temporal
non-streaming queries, or both? If (b), it is covered by the standard, and we should follow
the standard, period. If (a), syntax and semantics are still under discussion. 
>> 
>> If you want both, could we perhaps separate the streaming and non-streaming cases?
Could you log a jira case for supporting "FOR SYSTEM_TIME AS OF” in regular (non-streaming)
queries?
>> 
>>> The SQL standard doesn’t explicitly say the <point in time> must be a
constant, just all the databases only support constant.
>> 
>> I disagree. The standard says that it cannot contain a column reference:
>> 
>>> If <query system time period specification> QSTPS is specified, then:
>>> a) The table identified by the <table or query name> shall be a system-versioned
table.
>>> b) If BETWEEN is speci ed and neither SYMMETRIC nor ASYMMETRIC is specified,
then ASYMMETRIC is implicit.
>>> c) QSTPS shall not contain a <column reference> or an <SQL parameter
reference>.
>> 
>> We can go beyond the standard, but let’s be clear what it says.
>> 
>> Julian
>> 
>> 
>> 
>> 
>>> On Jul 31, 2017, at 5:15 AM, 伍翀(云邪) <wuchong.wc@alibaba-inc.com>
wrote:
>>> 
>>> Hi Julian,
>>> 
>>> The syntax you posted looks very good to me. Besides of that, we hope to extend
the <point in time> to support column reference(i.e. LATERAL + temporal  syntax). 
>>> 
>>> The ‘@ADayAgo’ bind variable is not a good example for our requirement. It
is still a query constant, but what we want is a variable timestamp changing with every record.
The example in “Streams, joins and temporal tables” [1] fits our requirement much more.
It uses the table-value function to express and not introduce new syntax. But it  it is not
a standard to manipulate temporal table. And it can only express querying the table content
as of a specified point in time, but can not express a period time (can it?). That’s why
we prefer the standard temporal syntax.
>>> 
>>> The SQL standard doesn’t explicitly say the <point in time> must be a
constant, just all the databases only support constant. But I think, it is very useful to
support column reference. For example: 
>>> 
>>> SELECT  *
>>>  FROM Orders AS o
>>>  JOIN LATERAL ProductPrices FOR SYSTEM_TIME AS OF o.orderTime AS p
>>>    ON o.productId = p.productId
>>> 
>>> This is actually what we want. 
>>> 
>>> What do you think?
>>> 
>>> [1] https://docs.google.com/document/d/1RvnLEEQK92axdAaZ9XIU5szpkbGqFMBtzYiIY4dHe0Q/edit#
<https://docs.google.com/document/d/1RvnLEEQK92axdAaZ9XIU5szpkbGqFMBtzYiIY4dHe0Q/edit#>
>>> - Jark Wu 
>>> 
>>>> 在 2017年7月29日,上午2:16,Julian Hyde <jhyde@apache.org>
写道:
>>>> 
>>>> '@' is specific to SQL Server, and a form of bind variable, so I don't
>>>> think we should go with that approach.
>>>> 
>>>> The standard explicitly says "QSTPS shall not contain a <column
>>>> reference> or an <SQL parameter reference>." We may choose to ignore
>>>> that, but I think we can do what you want without bind variables:
>>>> 
>>>> ... FROM dept FOR SYSTEM_TIME AS OF CURRENT_TIMESTAMP - INTERVAL '1' DAY
>>>> 
>>>> will suffice. (Recall that CURRENT_TIMESTAMP is a function, but its
>>>> value is constant throughout the execution of the query. So it behaves
>>>> somewhat like a bind variable.)
>>>> 
>>>> Can we go with the syntax in the standard? (The Microsoft syntax
>>>> diagram has a bug in it - the '| ...' line should be '[ ... ]', I
>>>> think - and Oracle's syntax throws in TIMESTAMP as a keyword, which
>>>> would create "AS OF TIMESTAMP TIMESTAMP" if the expression is a
>>>> timestamp literal.)
>>>> 
>>>> The standard has this:
>>>> 
>>>> <table primary> ::=
>>>>   <table or query name>
>>>>   [ <query system time period specification> ]
>>>>   [ <correlation or recognition> ]
>>>> | ...
>>>> 
>>>> <correlation or recognition> ::=
>>>>   [ AS ] <correlation name>
>>>>   [ <parenthesized derived column list> ]
>>>> | ...
>>>> 
>>>> <query system time period specification> ::=
>>>>   FOR SYSTEM_TIME AS OF <point in time 1>
>>>> | FOR SYSTEM_TIME BETWEEN [ ASYMMETRIC | SYMMETRIC ]
>>>>   <point in time 1> AND <point in time 2>
>>>> | FOR SYSTEM_TIME FROM <point in time 1> TO <point in time 2>
>>>> 
>>>> <point in time 1> ::=
>>>> <point in time>
>>>> 
>>>> <point in time 2> ::=
>>>> <point in time>
>>>> 
>>>> <point in time> ::=
>>>> <datetime value expression>
>>>> 
>>>> Julian
>>>> 
>>>> On Fri, Jul 28, 2017 at 2:49 AM, 伍翀(云邪) <wuchong.wc@alibaba-inc.com>
wrote:
>>>>> Hi Julian,
>>>>> 
>>>>> Thanks for the valuable input. We have read  “Streams, joins and temporal
>>>>> tables” you proposed before I started the discussion. The “temporal
table
>>>>> view” proposed in the doc is very interesting and well-defined. But
we hope
>>>>> to support this feature based on the standard SQL semantics.
>>>>> 
>>>>> But my impression of SQL:2011 (based on what is in Oracle) was that
>>>>> you could execute the WHOLE QUERY as of a particular timestamp, but
>>>>> you couldn't choose for table A to be at timestamp X and table B to be
>>>>> at timestamp Y. Furthermore, it only allows the timestamp to be
>>>>> constant, whereas we require the timestamp to be automatically
>>>>> varying.
>>>>> 
>>>>> 
>>>>> Regarding to “WHOLE QUERY” or “Table”, I didn't find any declaration
in
>>>>> SQL:2011, but I think it effects the Table not the query. In SQL Server’s
>>>>> “FROM” syntax[1], it is a subset of table syntax. In Oracle’s example[2],
it
>>>>> follows after table not the whole query. And I also tried in SQL Server,
the
>>>>> following SQL works well. The TemporalRates and TemporalRates2 are both
>>>>> temporal table with the same schema. The query chooses different timestamp
>>>>> of table TemporalRates and TemporalRates2:
>>>>> 
>>>>> SELECT * FROM TemporalRates2 FOR SYSTEM_TIME AS OF '2017-07-28
>>>>> 07:49:07.5649523' AS T2
>>>>> JOIN TemporalRates FOR SYSTEM_TIME AS OF '2017-07-28 04:26:19.5126849'
AS
>>>>> T1
>>>>> ON T1.id = T2.id;
>>>>> 
>>>>> So I think the only controversial point is it only allows constant
>>>>> timestamp. Then I find the following example from SQL Server doc [3]
:
>>>>> 
>>>>> DECLARE @ADayAgo datetime2
>>>>> SET @ADayAgo = DATEADD (day, -1, sysutcdatetime())
>>>>> /*Comparison between two points in time for subset of rows*/
>>>>> SELECT D_1_Ago.[DeptID], D.[DeptID],
>>>>> D_1_Ago.[DeptName], D.[DeptName],
>>>>> D_1_Ago.[SysStartTime], D.[SysStartTime],
>>>>> D_1_Ago.[SysEndTime], D.[SysEndTime]
>>>>> FROM [dbo].[Department] FOR SYSTEM_TIME AS OF @ADayAgo AS D_1_Ago
>>>>> JOIN [Department] AS D ON  D_1_Ago.[DeptID] = [D].[DeptID]
>>>>> AND D_1_Ago.[DeptID] BETWEEN 1 and 5 ;
>>>>> 
>>>>> The time @ADayAgo is a variable not a constant here, so I’m wondering
>>>>> whether it is acceptable to support LATERAL temporal to make the timestamp
>>>>> automatically varying. Which is very useful in many scenarios.
>>>>> 
>>>>> [1] https://docs.microsoft.com/en-us/sql/t-sql/queries/from-transact-sql
>>>>> [2] https://oracle-base.com/articles/12c/temporal-validity-12cr1
>>>>> [3]
>>>>> https://docs.microsoft.com/en-us/sql/relational-databases/tables/querying-data-in-a-system-versioned-temporal-table
>>>>> 
>>>>> - Jark Wu
>>>>> 
>>>>> 在 2017年7月28日,上午9:03,Julian Hyde <jhyde@apache.org>
写道:
>>>>> 
>>>>> Jark,
>>>>> 
>>>>> One more thing.
>>>>> 
>>>>> I am reading Tyler Akidau’s "Robust Streaming SQL in Apache Apex, Beam,
>>>>> Calcite, & Flink”[1] now. It will be a couple of days before I
have concrete
>>>>> feedback but I will say now that Tyler has been giving temporal joins
a lot
>>>>> of thought, seems to have incorporated my thinking, and generally does
>>>>> excellent work.
>>>>> 
>>>>> So, definitely give his document serious consideration, as I am.
>>>>> 
>>>>> When we all agree that we have the concepts right, I think it likely
that we
>>>>> can embrace the syntax and semantics of temporal support that were
>>>>> introduced in SQL:2011. It’s important that we stay within the SQL
standard
>>>>> for areas that it already covers. And by the way, I added PERIOD support
to
>>>>> Calcite a while ago because that looked useful and un-contraversial.
>>>>> 
>>>>> Julian
>>>>> 
>>>>> [1] http://s.apache.org/streaming-sql-spec
>>>>> 
>>>>> [2] https://issues.apache.org/jira/browse/CALCITE-715
>>>>> 
>>>>> 
>>>>> On Jul 27, 2017, at 10:17 AM, Julian Hyde <jhyde@apache.org> wrote:
>>>>> 
>>>>> This looks very interesting.
>>>>> 
>>>>> Are you familiar with the proposal I created in September, "Streams,
>>>>> joins and temporal tables"[1]? I started thinking about
>>>>> stream-to-table joins, where the tables where time-varying, and ended
>>>>> up with temporal database semantics.
>>>>> 
>>>>> But my impression of SQL:2011 (based on what is in Oracle) was that
>>>>> you could execute the WHOLE QUERY as of a particular timestamp, but
>>>>> you couldn't choose for table A to be at timestamp X and table B to be
>>>>> at timestamp Y. Furthermore, it only allows the timestamp to be
>>>>> constant, whereas we require the timestamp to be automatically
>>>>> varying.
>>>>> 
>>>>> I think you have come to similar conclusions. I would like to hear how
>>>>> your proposal fits with mine.
>>>>> 
>>>>> 
>>>>> Julian
>>>>> 
>>>>> [1]
>>>>> https://lists.apache.org/thread.html/4914256ad347e1d2a72506e2773e59590312d820f04cfcddaffaef1e@%3Cdev.calcite.apache.org%3E
>>>>> 
>>>>> On Wed, Jul 26, 2017 at 11:39 PM, 伍翀(云邪) <wuchong.wc@alibaba-inc.com>
wrote:
>>>>> 
>>>>> Hi all,
>>>>> 
>>>>> We are looking for the temporal support in Calcite, or so-called
>>>>> system-versioned temporal table. The temporal database feature was
>>>>> introduced in ANSI SQL 2011 [1] and was supported in SQL Server 2017
[2]. A
>>>>> system-versioned temporal table is designed to keep a full history of
data
>>>>> changes and allow easy point in time analysis. It is very useful for
>>>>> scenarios that require tracking history of data changes.
>>>>> 
>>>>> Because system-versioned tables are intended primarily for tracking
>>>>> historical data changes, queries on system-versioned tables often tend
to be
>>>>> concerned with retrieving the table content as of a given point in time
or
>>>>> between any two given points in time. SQL:2011 provides FOR SYSTEM_TIME
AS
>>>>> OF syntactic extension for this specific purpose. For example, the following
>>>>> query retrieves the rows of Emp that were current as of Jan. 2, 2011:
>>>>> SELECT ENo,EName,Sys_Start,Sys_End
>>>>> FROM Emp FOR SYSTEM_TIME AS OF TIMESTAMP '2011-01-02 00:00:00'
>>>>> 
>>>>> In addition, we need the time expression can be a relational expression
>>>>> whose value is from another table. For example, the following query joins
>>>>> the Orders to the Prices as the price was at the order time:
>>>>> SELECT STREAM *
>>>>> FROM Orders AS o
>>>>> JOIN LATERAL ProductPrices FOR SYSTEM_TIME AS OF o.orderTime AS p
>>>>>  ON o.productId = p.productId
>>>>> 
>>>>> So I would like to introduce the syntactic extension in Calcite. What
do you
>>>>> think about this? Any comments or suggestions are welcome!
>>>>> 
>>>>> [1]
>>>>> https://sigmodrecord.org/publications/sigmodRecord/1209/pdfs/07.industry.kulkarni.pdf
>>>>> [2]
>>>>> https://docs.microsoft.com/en-us/sql/relational-databases/tables/temporal-tables
>>>>> [3]
>>>>> https://docs.microsoft.com/en-us/sql/relational-databases/tables/querying-data-in-a-system-versioned-temporal-table
>>>>> [4] http://sqlhints.com/tag/for-system_time-as-of/
>>>>> 
>>>>> Bests,
>>>>> Jark Wu
>>>>> 
>>>>> 
>>> 
> 


Mime
View raw message