hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jesus Camacho Rodriguez (JIRA)" <>
Subject [jira] [Commented] (HIVE-15474) Extend limit propagation for chain of RS-GB-RS operators
Date Tue, 20 Dec 2016 18:35:59 GMT


Jesus Camacho Rodriguez commented on HIVE-15474:

[~xuefuz], thanks for leaving the comment, it would be great if you could take a look at the
patch too.

Propagating limit _N_ to GBy is valid iff GBy columns are a prefix of the OBy columns. This
is due to the fact that GBy will not produce duplicates for those columns, while Hive implementation
based on RS ensures that GBy output actually follows a certain order. Thus, we know that the
GBy will output the top _N_ records.

I took a conservative approach as we need to be sure that we remain correct; it might be that
the condition could be relaxed even further for some corner cases. However, we should not
do it without double checking the theoretical background.

> Extend limit propagation for chain of RS-GB-RS operators
> --------------------------------------------------------
>                 Key: HIVE-15474
>                 URL:
>             Project: Hive
>          Issue Type: Bug
>          Components: Physical Optimizer
>    Affects Versions: 2.2.0
>            Reporter: Jesus Camacho Rodriguez
>            Assignee: Jesus Camacho Rodriguez
>         Attachments: HIVE-15474.patch
> The goal is to extend the work started in HIVE-14002.
> For instance, given the following query:
> {code:sql}
> explain
> select key, value, count(key + 1) as agg1 from src 
> group by key, value
> order by key, value, agg1 limit 20;
> {code}
> We can push the limit to the GBy operator. However, currently we do not do it.

This message was sent by Atlassian JIRA

View raw message