drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacques Nadeau <jacques.dr...@gmail.com>
Subject Re: Moving & regular aggregates
Date Tue, 29 Jan 2013 02:03:17 GMT
I've been trying to clean up the syntax doc some.  I saw Ted added a bunch
of comments.  I'll go through and update the operators so that we have
discrete aggregation operators.

How do people feel about CollapseAggregate and RunningAggregate.  I'm
inclined to stay away from the GroupBy name since a traditional SQL group
by is really Group followed by CollapseAggregate.  I've also been
considering using the MS SQL Server naming of "Segment" instead of "Group".
 Anyone have any opinions on that?

Other comments below...


On Mon, Jan 28, 2013 at 11:41 AM, Ted Dunning <ted.dunning@gmail.com> wrote:

> On Mon, Jan 28, 2013 at 10:30 AM, Julian Hyde <julianhyde@gmail.com>
> wrote:
>
> > I think it's a mistake to use the same operator for regular and moving
> > aggregates. (Moving aggregates are also known as running aggregates.
> There
> > are sub-types called sliding and paged.)
> >
>
> I think that we need a distinction.  Different operator is fine.  Flag is
> fine.  I tend toward different operator, not because of the different kind
> of output but rather because of the different argument pattern.  Same
> answer, different rationale.
>
>
> > An regular aggregate would be "Compute the total sales for each product
> > each month".
> >
>
> OK.  We call this aggregate now.  The argument is a segment reference in
> the current logical plan.
>
>
> > A moving aggregate would be "For each sales order, compute the sales of
> > that product in that region over the past 20 days".
> >
>
> OK.  We need a new name.  Nominations are open.
>
> >> CollapseAggregate is my vote.


>
> > Consider their output. Regular aggregates output the grouping keys and
> the
> > aggregates. They can't output the input rows because they have been
> > aggregated into a single group. Moving aggregates output the original
> rows
> > PLUS any aggregates they compute.
> >
>

>> Agreed.  Let's split htis out.


>
> And there is a third kind which is the running aggregate, but those are
> plausibly an windowed aggregate with infinite extent backwards.
>

>> One option of the windowing operator is a full backward window.

>
> Consider how they are specified. Regular aggregates are specified by a set
> > of grouping keys, and a set of aggregate functions.  Moving aggregates
> are
> > specified by the grouping keys (called partition keys in the SQL
> standard,
> > for what it's worth) but also specifications of ordering (for rank etc.)
> > and window length (10 rows, or 2 hours).
> >
>
>
I've using segment key as partition key.  Primarily because partitions
often means something else to a lot of people...


> You raise an interesting point here.  The current argument structure is
> deficient.  We currently have before and after.  I think that should be
> restated to start and end indexes with negative indexes to indicate
> preceding records.  Your point here implies that we should also have a
> starting expression and an ending expression.
>
>
>> I'm not sure we're deficient.  The before and after are based within the
segment key.

>
> > Given this, I would separate the aggregate operator into a GroupBy
> > operator and a MovingAggregate operator. (The MovingAggregate operator
> > might have sub-types such as sliding and paged, as I mentioned above.)
>
>
> Should I take a stab at a revised specification for aggregation?  I dislike
> the groupBy name for an aggregation, but could be convinced by a show of
> hands.
>

>>I can take a shot at this.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message