flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kurt Young (JIRA)" <j...@apache.org>
Subject [jira] [Closed] (FLINK-11943) Support TopN feature for SQL
Date Mon, 20 May 2019 01:27:00 GMT

     [ https://issues.apache.org/jira/browse/FLINK-11943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Kurt Young closed FLINK-11943.
       Resolution: Duplicate
    Fix Version/s: 1.9.0

> Support TopN feature for SQL
> ----------------------------
>                 Key: FLINK-11943
>                 URL: https://issues.apache.org/jira/browse/FLINK-11943
>             Project: Flink
>          Issue Type: New Feature
>          Components: Table SQL / Runtime
>            Reporter: Jark Wu
>            Priority: Major
>             Fix For: 1.9.0
> TopN is a frequently used feature in data analysis. We can use ORDER BY + LIMIT to easily
express a TopN query, e.g. {{SELECT * FROM T ORDER BY amount DESC LIMIT 10}}.
> But this is a global TopN, there is a great requirement for per-group TopN. For example,
top 10 shops for each category. In order to avoid introducing new syntax for this, we would
like to use traditional syntax to express it by using {{ROW_NUMBER}} over window + {{FILTER}}
to limit the numbers.
> For example:
> FROM (
>   SELECT category, shopId, sales,
>           (PARTITION BY category ORDER BY sales ASC) as rownum
>   FROM shop_sales
> )
> WHERE rownum <= 10
> This issue is aiming to optimize this query to an {{Rank}} node instead of {{Over}} plus
{{Calc}}. And translate the {{Rank}} node into physical operators.
> There are some optimization for rank operator based on the different input of the Rank.
We would like to implement the basic and one-fit-all implementation. And do the performance
improvement later. 
> Here is a brief design doc: https://docs.google.com/document/d/14JCV6X6hcpoA51loprgntZNxQ2NmnDLucxgGY8xVDuI/edit#

This message was sent by Atlassian JIRA

View raw message