hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jesus Camacho Rodriguez (JIRA)" <>
Subject [jira] [Commented] (HIVE-16102) Grouping sets do not conform to SQL standard
Date Mon, 06 Mar 2017 11:15:33 GMT


Jesus Camacho Rodriguez commented on HIVE-16102:

[~ashutoshc], I have been investigating a little bit more the issue you pointed to.

One of the problems is that the message does not even describe clearly what the error is about:
_having only empty groups in the grouping sets is not allowed_. This was added in HIVE-3471,
together with some negative tests. However, the reason why this restriction is included is
not described in the JIRA case, and the review board link (where there might be some additional
information) does not work anymore. I added tests myself and empty grouping sets seems to
be working perfectly even when it is on its own; I also do not see any reason why this should
not be the case.

I am thinking that 1) I will revert that part of the patch and just change the error message,
this will fix every issue while also being less risky to backport, and 2) then in a follow-up
JIRA I will lift the restriction, as there is no need to backport that and we can just include
it in next release.

Concerning HIVE_GROUPING_SETS_EXPR_NOT_IN_GROUPBY error, that one makes sense, since it seeks
to prevent cases such as:
where _b_ is not part of the group by expression.

> Grouping sets do not conform to SQL standard
> --------------------------------------------
>                 Key: HIVE-16102
>                 URL:
>             Project: Hive
>          Issue Type: Bug
>          Components: Operators, Parser
>    Affects Versions: 1.3.0, 2.2.0
>            Reporter: Jesus Camacho Rodriguez
>            Assignee: Jesus Camacho Rodriguez
>            Priority: Critical
>         Attachments: HIVE-16102.patch
> [~ashutoshc] realized that the implementation of GROUPING__ID in Hive was not returning
values as specified by SQL standard and other execution engines.
> After digging into this, I found out that the implementation was bogus, as internally
it was changing between big-endian/little-endian representation of GROUPING__ID indistinctly,
and in some cases conversions in both directions were cancelling each other.
> In the documentation in,+Cube,+Grouping+and+Rollup
we can already find the problem, even if we did not spot it at first.
> {quote}
> The following query: SELECT key, value, GROUPING__ID, count(\*) from T1 GROUP BY key,
> will have the following results.
> | NULL | NULL | 0 | 6 |
> | 1 | NULL | 1 | 2 |
> | 1 | NULL | 3 | 1 |
> | 1 | 1 | 3 | 1 |
> ...
> {quote}
> Observe that value for GROUPING__ID in first row should be `3`, while for third and fourth
rows, it should be `0`.

This message was sent by Atlassian JIRA

View raw message