hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hive QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-17896) TopNKey: Create a standalone vectorizable TopNKey operator
Date Tue, 17 Jul 2018 12:22:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-17896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16546510#comment-16546510
] 

Hive QA commented on HIVE-17896:
--------------------------------



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12931901/HIVE-17896.12.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 48 failed/errored test(s), 14668 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[explainuser_2] (batchId=154)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_struct_type_vectorization]
(batchId=153)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[parquet_complex_types_vectorization]
(batchId=154)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[parquet_map_type_vectorization]
(batchId=154)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[parquet_struct_type_vectorization]
(batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucket_groupby] (batchId=179)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[check_constraint] (batchId=160)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[explainuser_1] (batchId=165)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[limit_pushdown3] (batchId=163)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[limit_pushdown] (batchId=174)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_decimal64_reader]
(batchId=168)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[offset_limit] (batchId=165)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[offset_limit_ppd_optimizer]
(batchId=173)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_cast_constant]
(batchId=157)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_char_2] (batchId=174)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_groupby_grouping_sets_limit]
(batchId=165)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_groupby_reduce]
(batchId=171)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_mr_diff_schema_alias]
(batchId=172)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_reduce_groupby_decimal]
(batchId=164)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_string_concat]
(batchId=164)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorization_limit] (batchId=165)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query10] (batchId=262)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query14] (batchId=262)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query15] (batchId=262)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query17] (batchId=262)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query25] (batchId=262)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query26] (batchId=262)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query27] (batchId=262)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query29] (batchId=262)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query35] (batchId=262)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query37] (batchId=262)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query40] (batchId=262)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query43] (batchId=262)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query45] (batchId=262)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query49] (batchId=262)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query50] (batchId=262)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query5] (batchId=262)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query60] (batchId=262)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query66] (batchId=262)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query69] (batchId=262)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query76] (batchId=262)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query77] (batchId=262)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query7] (batchId=262)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query80] (batchId=262)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query82] (batchId=262)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query8] (batchId=262)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query99] (batchId=262)
org.apache.hadoop.hive.metastore.TestMetaStoreEventListenerWithOldConf.testMetaConfNotifyListenersClosingClient
(batchId=227)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/12653/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/12653/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-12653/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 48 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12931901 - PreCommit-HIVE-Build

> TopNKey: Create a standalone vectorizable TopNKey operator
> ----------------------------------------------------------
>
>                 Key: HIVE-17896
>                 URL: https://issues.apache.org/jira/browse/HIVE-17896
>             Project: Hive
>          Issue Type: New Feature
>          Components: Operators
>    Affects Versions: 3.0.0
>            Reporter: Gopal V
>            Assignee: Teddy Choi
>            Priority: Major
>         Attachments: HIVE-17896.1.patch, HIVE-17896.10.patch, HIVE-17896.11.patch, HIVE-17896.12.patch,
HIVE-17896.3.patch, HIVE-17896.4.patch, HIVE-17896.5.patch, HIVE-17896.6.patch, HIVE-17896.7.patch,
HIVE-17896.8.patch, HIVE-17896.9.patch
>
>
> For TPC-DS Query27, the TopN operation is delayed by the group-by - the group-by operator
buffers up all the rows before discarding the 99% of the rows in the TopN Hash within the
ReduceSink Operator.
> The RS TopN operator is very restrictive as it only supports doing the filtering on the
shuffle keys, but it is better to do this before breaking the vectors into rows and losing
the isRepeating properties.
> Adding a TopN Key operator in the physical operator tree allows the following to happen.
> GBY->RS(Top=1)
> can become 
> TNK(1)->GBY->RS(Top=1)
> So that, the TopNKey can remove rows before they are buffered into the GBY and consume
memory.
> Here's the equivalent implementation in Presto
> https://github.com/prestodb/presto/blob/master/presto-main/src/main/java/com/facebook/presto/operator/TopNOperator.java#L35
> Adding this as a sub-feature of GroupBy prevents further optimizations if the GBY is
on keys "a,b,c" and the TopNKey is on just "a".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message