calcite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "伍翀(云邪)" <wuchong...@alibaba-inc.com>
Subject Re: [DISCUSS] Temporal Feature in Calcite
Date Tue, 01 Aug 2017 08:33:30 GMT
Hi Julian,

We are interested in both and I have logged CALCITE-1912 for supporting “FOR SYSTEM_TIME
AS OF” in regular queries.

Regarding to (b), you mean it can be covered by standard. I agree, the query I posted in the
previous mail can be rewrote as a subquery:

 SELECT  *
    FROM Orders AS o
    JOIN LATERAL (SELECT * FROM ProductPrices WHERE sysStart <= O.orderTime AND sysEnd
> O.orderTime) AS P
      ON o.productId = p.productId

But subquery is too complex for users, the period condition is trivial for users. Using the
“FOR SYSTEM_TIME AS OF”  is much simpler. Yes, you are right that the standard says it
can’t contain a column reference. We initialize this discuss as we would like to "extend"
the standard to simplify such query. What do you think?

- Jark Wu 

> 在 2017年8月1日,上午2:58,Julian Hyde <jhyde@apache.org> 写道:
> 
> Can you clarify whether you are interested in (a) streaming queries, (b) temporal non-streaming
queries, or both? If (b), it is covered by the standard, and we should follow the standard,
period. If (a), syntax and semantics are still under discussion. 
> 
> If you want both, could we perhaps separate the streaming and non-streaming cases? Could
you log a jira case for supporting "FOR SYSTEM_TIME AS OF” in regular (non-streaming) queries?
> 
>> The SQL standard doesn’t explicitly say the <point in time> must be a constant,
just all the databases only support constant.
> 
> I disagree. The standard says that it cannot contain a column reference:
> 
>> If <query system time period specification> QSTPS is specified, then:
>> a) The table identified by the <table or query name> shall be a system-versioned
table.
>> b) If BETWEEN is speci ed and neither SYMMETRIC nor ASYMMETRIC is specified, then
ASYMMETRIC is implicit.
>> c) QSTPS shall not contain a <column reference> or an <SQL parameter reference>.
> 
> We can go beyond the standard, but let’s be clear what it says.
> 
> Julian
> 
> 
> 
> 
>> On Jul 31, 2017, at 5:15 AM, 伍翀(云邪) <wuchong.wc@alibaba-inc.com> wrote:
>> 
>> Hi Julian,
>> 
>> The syntax you posted looks very good to me. Besides of that, we hope to extend the
<point in time> to support column reference(i.e. LATERAL + temporal  syntax). 
>> 
>> The ‘@ADayAgo’ bind variable is not a good example for our requirement. It is
still a query constant, but what we want is a variable timestamp changing with every record.
The example in “Streams, joins and temporal tables” [1] fits our requirement much more.
It uses the table-value function to express and not introduce new syntax. But it  it is not
a standard to manipulate temporal table. And it can only express querying the table content
as of a specified point in time, but can not express a period time (can it?). That’s why
we prefer the standard temporal syntax.
>> 
>> The SQL standard doesn’t explicitly say the <point in time> must be a constant,
just all the databases only support constant. But I think, it is very useful to support column
reference. For example: 
>> 
>> SELECT  *
>>   FROM Orders AS o
>>   JOIN LATERAL ProductPrices FOR SYSTEM_TIME AS OF o.orderTime AS p
>>     ON o.productId = p.productId
>> 
>> This is actually what we want. 
>> 
>> What do you think?
>> 
>> [1] https://docs.google.com/document/d/1RvnLEEQK92axdAaZ9XIU5szpkbGqFMBtzYiIY4dHe0Q/edit#
<https://docs.google.com/document/d/1RvnLEEQK92axdAaZ9XIU5szpkbGqFMBtzYiIY4dHe0Q/edit#>
>> - Jark Wu 
>> 
>>> 在 2017年7月29日,上午2:16,Julian Hyde <jhyde@apache.org> 写道:
>>> 
>>> '@' is specific to SQL Server, and a form of bind variable, so I don't
>>> think we should go with that approach.
>>> 
>>> The standard explicitly says "QSTPS shall not contain a <column
>>> reference> or an <SQL parameter reference>." We may choose to ignore
>>> that, but I think we can do what you want without bind variables:
>>> 
>>> ... FROM dept FOR SYSTEM_TIME AS OF CURRENT_TIMESTAMP - INTERVAL '1' DAY
>>> 
>>> will suffice. (Recall that CURRENT_TIMESTAMP is a function, but its
>>> value is constant throughout the execution of the query. So it behaves
>>> somewhat like a bind variable.)
>>> 
>>> Can we go with the syntax in the standard? (The Microsoft syntax
>>> diagram has a bug in it - the '| ...' line should be '[ ... ]', I
>>> think - and Oracle's syntax throws in TIMESTAMP as a keyword, which
>>> would create "AS OF TIMESTAMP TIMESTAMP" if the expression is a
>>> timestamp literal.)
>>> 
>>> The standard has this:
>>> 
>>> <table primary> ::=
>>>    <table or query name>
>>>    [ <query system time period specification> ]
>>>    [ <correlation or recognition> ]
>>>  | ...
>>> 
>>> <correlation or recognition> ::=
>>>    [ AS ] <correlation name>
>>>    [ <parenthesized derived column list> ]
>>>  | ...
>>> 
>>> <query system time period specification> ::=
>>>    FOR SYSTEM_TIME AS OF <point in time 1>
>>>  | FOR SYSTEM_TIME BETWEEN [ ASYMMETRIC | SYMMETRIC ]
>>>    <point in time 1> AND <point in time 2>
>>>  | FOR SYSTEM_TIME FROM <point in time 1> TO <point in time 2>
>>> 
>>> <point in time 1> ::=
>>>  <point in time>
>>> 
>>> <point in time 2> ::=
>>>  <point in time>
>>> 
>>> <point in time> ::=
>>>  <datetime value expression>
>>> 
>>> Julian
>>> 
>>> On Fri, Jul 28, 2017 at 2:49 AM, 伍翀(云邪) <wuchong.wc@alibaba-inc.com>
wrote:
>>>> Hi Julian,
>>>> 
>>>> Thanks for the valuable input. We have read  “Streams, joins and temporal
>>>> tables” you proposed before I started the discussion. The “temporal table
>>>> view” proposed in the doc is very interesting and well-defined. But we
hope
>>>> to support this feature based on the standard SQL semantics.
>>>> 
>>>> But my impression of SQL:2011 (based on what is in Oracle) was that
>>>> you could execute the WHOLE QUERY as of a particular timestamp, but
>>>> you couldn't choose for table A to be at timestamp X and table B to be
>>>> at timestamp Y. Furthermore, it only allows the timestamp to be
>>>> constant, whereas we require the timestamp to be automatically
>>>> varying.
>>>> 
>>>> 
>>>> Regarding to “WHOLE QUERY” or “Table”, I didn't find any declaration
in
>>>> SQL:2011, but I think it effects the Table not the query. In SQL Server’s
>>>> “FROM” syntax[1], it is a subset of table syntax. In Oracle’s example[2],
it
>>>> follows after table not the whole query. And I also tried in SQL Server,
the
>>>> following SQL works well. The TemporalRates and TemporalRates2 are both
>>>> temporal table with the same schema. The query chooses different timestamp
>>>> of table TemporalRates and TemporalRates2:
>>>> 
>>>> SELECT * FROM TemporalRates2 FOR SYSTEM_TIME AS OF '2017-07-28
>>>> 07:49:07.5649523' AS T2
>>>> JOIN TemporalRates FOR SYSTEM_TIME AS OF '2017-07-28 04:26:19.5126849' AS
>>>> T1
>>>> ON T1.id = T2.id;
>>>> 
>>>> So I think the only controversial point is it only allows constant
>>>> timestamp. Then I find the following example from SQL Server doc [3] :
>>>> 
>>>> DECLARE @ADayAgo datetime2
>>>> SET @ADayAgo = DATEADD (day, -1, sysutcdatetime())
>>>> /*Comparison between two points in time for subset of rows*/
>>>> SELECT D_1_Ago.[DeptID], D.[DeptID],
>>>> D_1_Ago.[DeptName], D.[DeptName],
>>>> D_1_Ago.[SysStartTime], D.[SysStartTime],
>>>> D_1_Ago.[SysEndTime], D.[SysEndTime]
>>>> FROM [dbo].[Department] FOR SYSTEM_TIME AS OF @ADayAgo AS D_1_Ago
>>>> JOIN [Department] AS D ON  D_1_Ago.[DeptID] = [D].[DeptID]
>>>> AND D_1_Ago.[DeptID] BETWEEN 1 and 5 ;
>>>> 
>>>> The time @ADayAgo is a variable not a constant here, so I’m wondering
>>>> whether it is acceptable to support LATERAL temporal to make the timestamp
>>>> automatically varying. Which is very useful in many scenarios.
>>>> 
>>>> [1] https://docs.microsoft.com/en-us/sql/t-sql/queries/from-transact-sql
>>>> [2] https://oracle-base.com/articles/12c/temporal-validity-12cr1
>>>> [3]
>>>> https://docs.microsoft.com/en-us/sql/relational-databases/tables/querying-data-in-a-system-versioned-temporal-table
>>>> 
>>>> - Jark Wu
>>>> 
>>>> 在 2017年7月28日,上午9:03,Julian Hyde <jhyde@apache.org>
写道:
>>>> 
>>>> Jark,
>>>> 
>>>> One more thing.
>>>> 
>>>> I am reading Tyler Akidau’s "Robust Streaming SQL in Apache Apex, Beam,
>>>> Calcite, & Flink”[1] now. It will be a couple of days before I have
concrete
>>>> feedback but I will say now that Tyler has been giving temporal joins a lot
>>>> of thought, seems to have incorporated my thinking, and generally does
>>>> excellent work.
>>>> 
>>>> So, definitely give his document serious consideration, as I am.
>>>> 
>>>> When we all agree that we have the concepts right, I think it likely that
we
>>>> can embrace the syntax and semantics of temporal support that were
>>>> introduced in SQL:2011. It’s important that we stay within the SQL standard
>>>> for areas that it already covers. And by the way, I added PERIOD support
to
>>>> Calcite a while ago because that looked useful and un-contraversial.
>>>> 
>>>> Julian
>>>> 
>>>> [1] http://s.apache.org/streaming-sql-spec
>>>> 
>>>> [2] https://issues.apache.org/jira/browse/CALCITE-715
>>>> 
>>>> 
>>>> On Jul 27, 2017, at 10:17 AM, Julian Hyde <jhyde@apache.org> wrote:
>>>> 
>>>> This looks very interesting.
>>>> 
>>>> Are you familiar with the proposal I created in September, "Streams,
>>>> joins and temporal tables"[1]? I started thinking about
>>>> stream-to-table joins, where the tables where time-varying, and ended
>>>> up with temporal database semantics.
>>>> 
>>>> But my impression of SQL:2011 (based on what is in Oracle) was that
>>>> you could execute the WHOLE QUERY as of a particular timestamp, but
>>>> you couldn't choose for table A to be at timestamp X and table B to be
>>>> at timestamp Y. Furthermore, it only allows the timestamp to be
>>>> constant, whereas we require the timestamp to be automatically
>>>> varying.
>>>> 
>>>> I think you have come to similar conclusions. I would like to hear how
>>>> your proposal fits with mine.
>>>> 
>>>> 
>>>> Julian
>>>> 
>>>> [1]
>>>> https://lists.apache.org/thread.html/4914256ad347e1d2a72506e2773e59590312d820f04cfcddaffaef1e@%3Cdev.calcite.apache.org%3E
>>>> 
>>>> On Wed, Jul 26, 2017 at 11:39 PM, 伍翀(云邪) <wuchong.wc@alibaba-inc.com>
wrote:
>>>> 
>>>> Hi all,
>>>> 
>>>> We are looking for the temporal support in Calcite, or so-called
>>>> system-versioned temporal table. The temporal database feature was
>>>> introduced in ANSI SQL 2011 [1] and was supported in SQL Server 2017 [2].
A
>>>> system-versioned temporal table is designed to keep a full history of data
>>>> changes and allow easy point in time analysis. It is very useful for
>>>> scenarios that require tracking history of data changes.
>>>> 
>>>> Because system-versioned tables are intended primarily for tracking
>>>> historical data changes, queries on system-versioned tables often tend to
be
>>>> concerned with retrieving the table content as of a given point in time or
>>>> between any two given points in time. SQL:2011 provides FOR SYSTEM_TIME AS
>>>> OF syntactic extension for this specific purpose. For example, the following
>>>> query retrieves the rows of Emp that were current as of Jan. 2, 2011:
>>>> SELECT ENo,EName,Sys_Start,Sys_End
>>>> FROM Emp FOR SYSTEM_TIME AS OF TIMESTAMP '2011-01-02 00:00:00'
>>>> 
>>>> In addition, we need the time expression can be a relational expression
>>>> whose value is from another table. For example, the following query joins
>>>> the Orders to the Prices as the price was at the order time:
>>>> SELECT STREAM *
>>>> FROM Orders AS o
>>>> JOIN LATERAL ProductPrices FOR SYSTEM_TIME AS OF o.orderTime AS p
>>>>   ON o.productId = p.productId
>>>> 
>>>> So I would like to introduce the syntactic extension in Calcite. What do
you
>>>> think about this? Any comments or suggestions are welcome!
>>>> 
>>>> [1]
>>>> https://sigmodrecord.org/publications/sigmodRecord/1209/pdfs/07.industry.kulkarni.pdf
>>>> [2]
>>>> https://docs.microsoft.com/en-us/sql/relational-databases/tables/temporal-tables
>>>> [3]
>>>> https://docs.microsoft.com/en-us/sql/relational-databases/tables/querying-data-in-a-system-versioned-temporal-table
>>>> [4] http://sqlhints.com/tag/for-system_time-as-of/
>>>> 
>>>> Bests,
>>>> Jark Wu
>>>> 
>>>> 
>> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message