hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hive QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-17043) Remove non unique columns from group by keys if not referenced later
Date Sun, 23 Sep 2018 23:07:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-17043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16625288#comment-16625288
] 

Hive QA commented on HIVE-17043:
--------------------------------



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12940852/HIVE-17043.5.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/14008/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/14008/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-14008/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Tests exited with: Exception: Patch URL https://issues.apache.org/jira/secure/attachment/12940852/HIVE-17043.5.patch
was found in seen patch url's cache and a test was probably run already on it. Aborting...
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12940852 - PreCommit-HIVE-Build

> Remove non unique columns from group by keys if not referenced later
> --------------------------------------------------------------------
>
>                 Key: HIVE-17043
>                 URL: https://issues.apache.org/jira/browse/HIVE-17043
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Logical Optimizer
>    Affects Versions: 3.0.0
>            Reporter: Ashutosh Chauhan
>            Assignee: Vineet Garg
>            Priority: Major
>         Attachments: HIVE-17043.1.patch, HIVE-17043.2.patch, HIVE-17043.3.patch, HIVE-17043.4.patch,
HIVE-17043.5.patch
>
>
> Group by keys may be a mix of unique (or primary) keys and regular columns. In such cases
presence of regular column won't alter cardinality of groups. So, if regular columns are not
referenced later, they can be dropped from group by keys. Depending on operator tree may result
in those columns not being read at all from disk in best case. In worst case, we will avoid
shuffling and sorting regular columns from mapper to reducer, which still could be substantial
CPU and network savings.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message