hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Teddy Choi (JIRA)" <>
Subject [jira] [Updated] (HIVE-17896) TopNKey: Create a standalone vectorizable TopNKey operator
Date Tue, 17 Jul 2018 09:15:00 GMT


Teddy Choi updated HIVE-17896:
    Attachment: HIVE-17896.12.patch

> TopNKey: Create a standalone vectorizable TopNKey operator
> ----------------------------------------------------------
>                 Key: HIVE-17896
>                 URL:
>             Project: Hive
>          Issue Type: New Feature
>          Components: Operators
>    Affects Versions: 3.0.0
>            Reporter: Gopal V
>            Assignee: Teddy Choi
>            Priority: Major
>         Attachments: HIVE-17896.1.patch, HIVE-17896.10.patch, HIVE-17896.11.patch, HIVE-17896.12.patch,
HIVE-17896.3.patch, HIVE-17896.4.patch, HIVE-17896.5.patch, HIVE-17896.6.patch, HIVE-17896.7.patch,
HIVE-17896.8.patch, HIVE-17896.9.patch
> For TPC-DS Query27, the TopN operation is delayed by the group-by - the group-by operator
buffers up all the rows before discarding the 99% of the rows in the TopN Hash within the
ReduceSink Operator.
> The RS TopN operator is very restrictive as it only supports doing the filtering on the
shuffle keys, but it is better to do this before breaking the vectors into rows and losing
the isRepeating properties.
> Adding a TopN Key operator in the physical operator tree allows the following to happen.
> GBY->RS(Top=1)
> can become 
> TNK(1)->GBY->RS(Top=1)
> So that, the TopNKey can remove rows before they are buffered into the GBY and consume
> Here's the equivalent implementation in Presto
> Adding this as a sub-feature of GroupBy prevents further optimizations if the GBY is
on keys "a,b,c" and the TopNKey is on just "a".

This message was sent by Atlassian JIRA

View raw message