drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julian Hyde <julianh...@gmail.com>
Subject Moving & regular aggregates
Date Mon, 28 Jan 2013 18:30:46 GMT
I think it's a mistake to use the same operator for regular and moving aggregates. (Moving
aggregates are also known as running aggregates. There are sub-types called sliding and paged.)

An regular aggregate would be "Compute the total sales for each product each month".

A moving aggregate would be "For each sales order, compute the sales of that product in that
region over the past 20 days".

Consider their output. Regular aggregates output the grouping keys and the aggregates. They
can't output the input rows because they have been aggregated into a single group. Moving
aggregates output the original rows PLUS any aggregates they compute.

Consider how they are specified. Regular aggregates are specified by a set of grouping keys,
and a set of aggregate functions.  Moving aggregates are specified by the grouping keys (called
partition keys in the SQL standard, for what it's worth) but also specifications of ordering
(for rank etc.) and window length (10 rows, or 2 hours).

Consider their internals. Regular aggregates typically use a hashmap. Moving aggregates typically
use a hashmap but also make heavy use of sorted lists.

Given this, I would separate the aggregate operator into a GroupBy operator and a MovingAggregate
operator. (The MovingAggregate operator might have sub-types such as sliding and paged, as
I mentioned above.)

By the way, for both types of aggregates, it makes sense to have an empty set of grouping
keys. So, Ted is right on the money with https://issues.apache.org/jira/browse/DRILL-22.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message