flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kurt Young (JIRA)" <j...@apache.org>
Subject [jira] [Closed] (FLINK-12192) Add support for generating optimized logical plan for grouping sets and distinct aggregate
Date Fri, 19 Apr 2019 07:37:00 GMT

     [ https://issues.apache.org/jira/browse/FLINK-12192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Kurt Young closed FLINK-12192.
------------------------------
       Resolution: Fixed
    Fix Version/s: 1.9.0

fixed in 1c55f7a7e18e03f5b777c9fbe3234055904b00d6

> Add support for generating optimized logical plan for grouping sets and distinct aggregate
> ------------------------------------------------------------------------------------------
>
>                 Key: FLINK-12192
>                 URL: https://issues.apache.org/jira/browse/FLINK-12192
>             Project: Flink
>          Issue Type: New Feature
>          Components: Table SQL / Planner
>            Reporter: godfrey he
>            Assignee: godfrey he
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.9.0
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> This issue aims to supports generating optimized logical plan for grouping sets and distinct
aggregate. (mentioned in FLINK-12076 and FLINK-12098)
> for batch, query with distinct aggregate will be rewritten into two non-distinct aggregates
by extended [AggregateExpandDistinctAggregatesRule|https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/rules/AggregateExpandDistinctAggregatesRule.java],
the first aggregate computes the distinct key and non-distinct aggregate function, and the
second aggregate computes the distinct aggregate function based on first aggregate result. 
The first aggregate has grouping sets if there are more than one distinct aggregate functions
on different fields.
> for stream, query with distinct aggregate is handled by SplitAggregateRule in FLINK-12161.
> query with grouping sets (or cube, rollup) will be rewritten into a regular aggregate
with expand, and the expand node will duplicates the input data for each simple group.
>  e.g.
> {noformat}
> schema:
> MyTable: a: INT, b: BIGINT, c: VARCHAR(32), d: VARCHAR(32)
>  Original records:
> +-----+-----+-----+-----+
> |  a  |  b  |  c  |  d  |
> +-----+-----+-----+-----+
> |  1  |  1  |  c1 |  d1 |
> +-----+-----+-----+-----+
> |  1  |  2  |  c1 |  d2 |
> +-----+-----+-----+-----+
> |  2  |  1  |  c1 |  d1 |
> +-----+-----+-----+-----+
> SELECT a, c, SUM(b) as b FROM MyTable GROUP BY GROUPING SETS (a, c)
> logical plan after expanded:
> LogicalCalc(expr#0..3=[{inputs}], proj#0..1=[{exprs}], b=[$t3])
>     LogicalAggregate(group=[{0, 2, 3}], groups=[[]], b=[SUM($1)])
>         LogicalExpand(projects=[{a=[$0], b=[$1], c=[null], $e=[1]}, {a=[null], b=[$1],
c=[$2], $e=[2]}])
>             LogicalNativeTableScan(table=[[builtin, default, MyTable]])
> notes:
> '$e = 1' is equivalent to 'group by a'
> '$e = 2' is equivalent to 'group by c'
> expanded records:
> +-----+-----+-----+-----+
> |  a  |  b  |  c  | $e  |
> +-----+-----+-----+-----+        ---+---
> |  1  |  1  | null|  1  |           |
> +-----+-----+-----+-----+  records expanded by record1
> | null|  1  |  c1 |  2  |           |
> +-----+-----+-----+-----+        ---+---
> |  1  |  2  | null|  1  |           |
> +-----+-----+-----+-----+  records expanded by record2
> | null|  2  |  c1 |  2  |           |
> +-----+-----+-----+-----+        ---+---
> |  2  |  1  | null|  1  |           |
> +-----+-----+-----+-----+  records expanded by record3
> | null|  1  |  c1 |  2  |           |
> +-----+-----+-----+-----+        ---+---
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message