hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Prateek Rungta (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-11742) last_value window specifier enforces ordering as a partition
Date Fri, 04 Sep 2015 21:56:45 GMT

    [ https://issues.apache.org/jira/browse/HIVE-11742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14731499#comment-14731499
] 

Prateek Rungta commented on HIVE-11742:
---------------------------------------

If folks agree with the change, I'd like to use this JIRA to contribute to Hive. Please assign
it to me.

> last_value window specifier enforces ordering as a partition
> ------------------------------------------------------------
>
>                 Key: HIVE-11742
>                 URL: https://issues.apache.org/jira/browse/HIVE-11742
>             Project: Hive
>          Issue Type: Bug
>          Components: PTF-Windowing
>            Reporter: Prateek Rungta
>
> [HIVE-4262|https://issues.apache.org/jira/browse/HIVE-4262] changed the partitioning
behavior of the last_value function. For a specified last_value() OVER X. The ordering spec
within X is used in addition to the partition spec for partitioning. i.e. last_value(a) OVER
(PARTITION BY i ORDER BY j) operates last_value(a) on all rows within the unique combination
of (i,j). The behavior I'd expect is for PARTITION BY i to define the partitioning, and ORDER
BY to define the ordering within the PARTITION. i.e. last_value(a) OVER (PARTITION BY i ORDER
BY j) should operate last_value(a) on all rows within the unique values of (i), ordered by
j within the partition. 
> This was changed to be consistent with how SQLServer handled such queries. [SQLServer
Docs|https://msdn.microsoft.com/en-us/library/hh231517.aspx?f=255&MSPPError=-2147217396]
describe their example (which performs as Hive does): 
> {quote}
> The PARTITION BY clause partitions the employees by department and the LAST_VALUE function
is applied to each partition independently. The ORDER BY clause specified in the OVER clause
determines the logical order in which the LAST_VALUE function is applied to the rows in each
partition.
> {quote}
> To me, their behavior is inconsistent with their description. I've filled an [upstream
bug|https://connect.microsoft.com/SQLServer/feedback/details/1753482] with Microsoft for the
same. 
> [Oracle|https://oracle-base.com/articles/misc/first-value-and-last-value-analytic-functions]
and [Redshift|http://docs.aws.amazon.com/redshift/latest/dg/r_Examples_of_firstlast_WF.html]
both exhibit the behavior I'd expect.
> Considering Hive-4262 has been in core for 2+ years, I don't think we can change the
behavior without potentially impacting clients. But I would like a way to enable the expected
behavior at the least (behind a config flag maybe?). What do you think?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message