calcite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fabian Hueske <fhue...@gmail.com>
Subject Re: New type of window semantics
Date Tue, 27 Sep 2016 11:30:23 GMT
Sliding windows can be partitioned but this definition does not go into a
GROUP BY clause but into a PARTITION BY clause added to OVER.
GROUP BY returns one record per group which is not what you want.
Check the syntax of the OVER clause, e.g., here [1] (see windowSpec at the
end).

[1] http://calcite.apache.org/docs/reference.html



2016-09-27 13:20 GMT+02:00 Radu Tudoran <radu.tudoran@huawei.com>:

> Hi,
>
> Thanks for the answer.
> As a follow up question - is it possible to use a GROUP BY clause after
> the previous query?
>
> SELECT SREAM orderId, price, AVG(price) OVER (ORDER BY orderTime ROWS
>  5 PRECEDING)
>    FROM Orders
>    GROUP BY order_type
>
>
> I am asking this from the perspective of knowing if this would enable to
> implement this over KeyedStream(s) such as the ones in Flink + the windows
>
>
>
> -----Original Message-----
> From: Fabian Hueske [mailto:fhueske@gmail.com]
> Sent: Tuesday, September 27, 2016 11:32 AM
> To: dev@calcite.apache.org
> Subject: Re: New type of window semantics
>
> Hi Radu,
>
> sliding windows as described by Julian will emit exactly one row for each
> incoming row.
> In the scenario you describe only one row will be emitted when ordN6
> arrives (otherwise, each input row would result in five emitted rows).
>
> So sliding windows seem to be what you are looking for.
>
> Best, Fabian
>
> 2016-09-27 10:59 GMT+02:00 Radu Tudoran <radu.tudoran@huawei.com>:
>
> > Hi,
> >
> > Thanks for this points.
> > I am not sure if I really understood the implications of using this
> > option in the stream mode. I got the point that if we have 20 rows
> > then we have 20 outputs. However, I wonder what happens when a new
> > record comes in and we have the query you proposed
> >
> > SELECT SREAM orderId, price, AVG(price) OVER (ORDER BY orderTime ROWS
> > 5
> > PRECEDING)
> >   FROM Orders
> >
> >
> > Assuming we have up to moment T te following 5 orders:
> >
> > ordN1, ordN2, ordN3, ordN4, ordN5
> >
> > and we get ordN6 at moment T+1
> > ..will the query provide only one result corresponding to ordN6 and
> > thus average over ordN2, ordN3, ordN4, ordN5, ordN6....or because
> > ordN2 to ordN5 are still in the system the query will return 5 results?
> >
> >
> > If the query answer is 1 output in this case corersponding to element
> > ordN6 then indeed it can do the job for this scenario.
> >
> >
> >
> >
> >
> > -----Original Message-----
> > From: Julian Hyde [mailto:jhyde@apache.org]
> > Sent: Tuesday, September 27, 2016 2:41 AM
> > To: dev@calcite.apache.org
> > Subject: Re: New type of window semantics
> >
> > Have you considered the sliding window, which is already part of
> > standard SQL?  We propose to support it in streaming SQL also. Here is
> an example:
> >
> >   SELECT orderId, price, AVG(price) OVER (ORDER BY orderTime ROWS 5
> > PRECEDING)
> >   FROM Orders
> >
> > (This is a non-streaming query, but you can add the STREAM keyword to
> > get a streaming query.)
> >
> > Given orders 1 .. 20, then order 10 would show the average for orders 5
> ..
> > 10 inclusive, order 11 would show the average for orders 6 .. 11, and
> > so forth.
> >
> > In streaming queries, windows are often used in the GROUP BY clause,
> > but we do not use a GROUP BY here. The OVER clause with sliding
> > windows does not aggregate rows. If 20 rows come in, then 20 rows go
> > out. It makes sense, because each row cannot have its own window if
> > multiple rows are squashed into one.
> >
> > Julian
> >
> >
> >
> > > On Sep 26, 2016, at 12:53 AM, Radu Tudoran <radu.tudoran@huawei.com>
> > wrote:
> > >
> > > Hi,
> > >
> > > First of all let me introduce myself - My name is Radu Tudoran and I
> > > am
> > working in the field of Big Data processing with a high focus on
> > streaming and more recently in the area of SQL.
> > >
> > > I wanted to raise a question/proposal for discussion in the community:
> > >
> > > Based on our requirements I realized that I would need to create a
> > window (e.g. hop window) that would move on every incoming element based.
> > The syntax that I have in mind for it is
> > >
> > > HOP(column_name, # EVENT , INTERVAL # )   (or should it rather be #
> > ELEMENT instead of EVENT?)
> > >
> > > I wanted to check with you what do you think about such a grammar to
> > > go
> > directly in Calcite? I think it is relevant for streaming scenarios
> > where you do not necessary have events coming at regular time interval
> > but you would still like to react on every event.
> > > As an example you can consider a stock market application where you
> > would always compute for every new offer the average over the last hour.
> > >
> > > Best regards,
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message