calcite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fabian Hueske <fhue...@gmail.com>
Subject Re: New type of window semantics
Date Tue, 27 Sep 2016 09:31:53 GMT
Hi Radu,

sliding windows as described by Julian will emit exactly one row for each
incoming row.
In the scenario you describe only one row will be emitted when ordN6
arrives (otherwise, each input row would result in five emitted rows).

So sliding windows seem to be what you are looking for.

Best, Fabian

2016-09-27 10:59 GMT+02:00 Radu Tudoran <radu.tudoran@huawei.com>:

> Hi,
>
> Thanks for this points.
> I am not sure if I really understood the implications of using this option
> in the stream mode. I got the point that if we have 20 rows then we have 20
> outputs. However, I wonder what happens when a new record comes in and we
> have the query you proposed
>
> SELECT SREAM orderId, price, AVG(price) OVER (ORDER BY orderTime ROWS 5
> PRECEDING)
>   FROM Orders
>
>
> Assuming we have up to moment T te following 5 orders:
>
> ordN1, ordN2, ordN3, ordN4, ordN5
>
> and we get ordN6 at moment T+1
> ..will the query provide only one result corresponding to ordN6 and thus
> average over ordN2, ordN3, ordN4, ordN5, ordN6....or because ordN2 to ordN5
> are still in the system the query will return 5 results?
>
>
> If the query answer is 1 output in this case corersponding to element
> ordN6 then indeed it can do the job for this scenario.
>
>
>
>
>
> -----Original Message-----
> From: Julian Hyde [mailto:jhyde@apache.org]
> Sent: Tuesday, September 27, 2016 2:41 AM
> To: dev@calcite.apache.org
> Subject: Re: New type of window semantics
>
> Have you considered the sliding window, which is already part of standard
> SQL?  We propose to support it in streaming SQL also. Here is an example:
>
>   SELECT orderId, price, AVG(price) OVER (ORDER BY orderTime ROWS 5
> PRECEDING)
>   FROM Orders
>
> (This is a non-streaming query, but you can add the STREAM keyword to get
> a streaming query.)
>
> Given orders 1 .. 20, then order 10 would show the average for orders 5 ..
> 10 inclusive, order 11 would show the average for orders 6 .. 11, and so
> forth.
>
> In streaming queries, windows are often used in the GROUP BY clause, but
> we do not use a GROUP BY here. The OVER clause with sliding windows does
> not aggregate rows. If 20 rows come in, then 20 rows go out. It makes
> sense, because each row cannot have its own window if multiple rows are
> squashed into one.
>
> Julian
>
>
>
> > On Sep 26, 2016, at 12:53 AM, Radu Tudoran <radu.tudoran@huawei.com>
> wrote:
> >
> > Hi,
> >
> > First of all let me introduce myself - My name is Radu Tudoran and I am
> working in the field of Big Data processing with a high focus on streaming
> and more recently in the area of SQL.
> >
> > I wanted to raise a question/proposal for discussion in the community:
> >
> > Based on our requirements I realized that I would need to create a
> window (e.g. hop window) that would move on every incoming element based.
> The syntax that I have in mind for it is
> >
> > HOP(column_name, # EVENT , INTERVAL # )   (or should it rather be #
> ELEMENT instead of EVENT?)
> >
> > I wanted to check with you what do you think about such a grammar to go
> directly in Calcite? I think it is relevant for streaming scenarios where
> you do not necessary have events coming at regular time interval but you
> would still like to react on every event.
> > As an example you can consider a stock market application where you
> would always compute for every new offer the average over the last hour.
> >
> > Best regards,
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message