flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "vinoyang (JIRA)" <j...@apache.org>
Subject [jira] [Assigned] (FLINK-9673) Improve State efficiency of bounded OVER window operators
Date Fri, 22 Mar 2019 11:38:00 GMT

     [ https://issues.apache.org/jira/browse/FLINK-9673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

vinoyang reassigned FLINK-9673:

    Assignee: vinoyang

> Improve State efficiency of bounded OVER window operators
> ---------------------------------------------------------
>                 Key: FLINK-9673
>                 URL: https://issues.apache.org/jira/browse/FLINK-9673
>             Project: Flink
>          Issue Type: Improvement
>          Components: Table SQL / Legacy Planner
>            Reporter: Fabian Hueske
>            Assignee: vinoyang
>            Priority: Major
> Currently, the implementations of bounded OVER window aggregations store the complete
input for the bound interval. For example for the query:
> {code:java}
> SELECT user_id, count(action) OVER (PARTITION BY user_id ORDER BY rowtime RANGE INTERVAL
'14' DAY PRECEDING) action_count, rowtime
>     SELECT rowtime, user_id, action, val1, val2, val3, val4 FROM user
> {code}
> The whole records with schema {{(rowtime, user_id, action, val1, val2, val3, val4)}}
are stored for 14 days in order to retract them after 14 days from the accumulators.
> However, it would be sufficient to only store those fields that are required for the
aggregtions, i.e., {{action}} in the example above. All other fields could be set to {{null}}
and hence significantly reduce the amount of data that needs to be stored in state.
> This improvement can be applied to all four combinations of bounded [rowtime|proctime]
[range|rows] OVER windows.

This message was sent by Atlassian JIRA

View raw message