[ https://issues.apache.org/jira/browse/HIVE-23438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105304#comment-17105304
]
Hive QA commented on HIVE-23438:
--------------------------------
Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/13002689/HIVE-23438.branch-2.3.patch
{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.
{color:red}ERROR:{color} -1 due to 44 failed/errored test(s), 10635 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[multi_insert_union_src] (batchId=56)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[union24] (batchId=56)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[union_fast_stats] (batchId=48)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[unionDistinct_1] (batchId=139)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[cbo_rp_lineage2] (batchId=143)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[dynamic_semijoin_reduction]
(batchId=151)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[explainuser_4] (batchId=149)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[hybridgrace_hashjoin_2]
(batchId=146)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[lineage2] (batchId=152)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[mergejoin] (batchId=153)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_dynpart_hashjoin_1]
(batchId=157)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=143)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_dynamic_semijoin_reduction2]
(batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_dynamic_semijoin_reduction]
(batchId=142)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_3] (batchId=96)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_4] (batchId=97)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[hybridgrace_hashjoin_2] (batchId=96)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[merge_negative_1] (batchId=87)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[union24] (batchId=123)
org.apache.hadoop.hive.ql.TestTxnCommands2.testNonAcidToAcidConversion02 (batchId=264)
org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdate.testNonAcidToAcidConversion02 (batchId=276)
org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdateAndVectorization.testNonAcidToAcidConversion02
(batchId=273)
org.apache.hive.beeline.cli.TestHiveCli.testCmd (batchId=173)
org.apache.hive.beeline.cli.TestHiveCli.testDatabaseOptions (batchId=173)
org.apache.hive.beeline.cli.TestHiveCli.testErrOutput (batchId=173)
org.apache.hive.beeline.cli.TestHiveCli.testHelp (batchId=173)
org.apache.hive.beeline.cli.TestHiveCli.testInValidCmd (batchId=173)
org.apache.hive.beeline.cli.TestHiveCli.testInvalidDatabaseOptions (batchId=173)
org.apache.hive.beeline.cli.TestHiveCli.testInvalidOptions (batchId=173)
org.apache.hive.beeline.cli.TestHiveCli.testInvalidOptions2 (batchId=173)
org.apache.hive.beeline.cli.TestHiveCli.testNoErrorDB (batchId=173)
org.apache.hive.beeline.cli.TestHiveCli.testSetHeaderValue (batchId=173)
org.apache.hive.beeline.cli.TestHiveCli.testSetPromptValue (batchId=173)
org.apache.hive.beeline.cli.TestHiveCli.testSourceCmd (batchId=173)
org.apache.hive.beeline.cli.TestHiveCli.testSourceCmd2 (batchId=173)
org.apache.hive.beeline.cli.TestHiveCli.testSourceCmd3 (batchId=173)
org.apache.hive.beeline.cli.TestHiveCli.testSqlFromCmd (batchId=173)
org.apache.hive.beeline.cli.TestHiveCli.testSqlFromCmdWithDBName (batchId=173)
org.apache.hive.beeline.cli.TestHiveCli.testUseCurrentDB1 (batchId=173)
org.apache.hive.beeline.cli.TestHiveCli.testUseCurrentDB2 (batchId=173)
org.apache.hive.beeline.cli.TestHiveCli.testUseCurrentDB3 (batchId=173)
org.apache.hive.beeline.cli.TestHiveCli.testUseInvalidDB (batchId=173)
org.apache.hive.beeline.cli.TestHiveCli.testVariables (batchId=173)
org.apache.hive.beeline.cli.TestHiveCli.testVariablesForSource (batchId=173)
{noformat}
Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/22281/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/22281/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-22281/
Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 44 tests failed
{noformat}
This message is automatically generated.
ATTACHMENT ID: 13002689 - PreCommit-HIVE-Build
> Missing Rows When Left Outer Join In N-way HybridGraceHashJoin
> --------------------------------------------------------------
>
> Key: HIVE-23438
> URL: https://issues.apache.org/jira/browse/HIVE-23438
> Project: Hive
> Issue Type: Bug
> Components: SQL, Tez
> Affects Versions: 2.3.4
> Reporter: 范宜臻
> Priority: Major
> Attachments: HIVE-23438.branch-2.3.patch
>
>
> *Run Test in Patch File*
> {code:java}
> mvn test -Dtest=TestMiniTezCliDriver -Dqfile=hybridgrace_hashjoin_2.q{code}
> *Manual Reproduce*
> *STEP 1. Create test data(q_test_init_tez.sql)*
> {code:java}
> //create table src1
> CREATE TABLE src1 (key STRING COMMENT 'default', value STRING COMMENT 'default') STORED
AS TEXTFILE;
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv3.txt" INTO TABLE src1;
> //create table src2
> CREATE TABLE src2(key STRING COMMENT 'default', value STRING COMMENT 'default') STORED
AS TEXTFILE;
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv11.txt" OVERWRITE INTO TABLE src2;
> //create table srcpart
> CREATE TABLE srcpart (key STRING COMMENT 'default', value STRING COMMENT 'default')
> PARTITIONED BY (ds STRING, hr STRING)
> STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-08", hr="11");
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-08", hr="12");
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-09", hr="11");
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-09", hr="12");{code}
> *STEP 2. Run query*
> {code:java}
> set hive.auto.convert.join=true;
> set hive.auto.convert.join.noconditionaltask=true;
> set hive.auto.convert.join.noconditionaltask.size=10000000;
> set hive.cbo.enable=false;
> set hive.mapjoin.hybridgrace.hashtable=true;
> select *
> from
> (
> select key from src1 group by key
> ) x
> left join src2 z on x.key = z.key
> join
> (
> select key from srcpart y group by key
> ) y on y.key = x.key;
> {code}
> *EXPECTED RESULT***
>
> {code:java}
> 128 NULL NULL 128
> 146 146 1val_1461 146
> 150 150 1val_1501 150
> 238 NULL NULL 238
> 369 NULL NULL 369
> 406 406 1val_4061 406
> 273 273 1val_2731 273
> 98 NULL NULL 98
> 213 213 1val_2131 213
> 255 NULL NULL 255
> 401 401 1val_4011 401
> 278 NULL NULL 278
> 66 66 11val_6611 66
> 224 NULL NULL 224
> 311 NULL NULL 311
> {code}
>
> *ACTUAL RESULT*
> {code:java}
> 128 NULL NULL 128
> 146 146 1val_1461 146
> 150 150 1val_1501 150
> 213 213 1val_2131 213
> 238 NULL NULL 238
> 273 273 1val_2731 273
> 369 NULL NULL 369
> 406 406 1val_4061 406
> 98 NULL NULL 98
> 401 401 1val_4011 401
> 66 66 11val_6611 66
> {code}
>
> *ROOT CAUSE*
> src1 left join src2, src1 is big table and src2 is small table. Join result between big
table row and the corresponding hashtable maybe NO_MATCH state, however, these NO_MATCH rows
is needed because LEFT OUTER JOIN.
> In addition, these big table rows will not spilled into matchfile related to this hashtable
on disk because only SPILL state can use `spillBigTableRow`. Then, these big table rows
will be spilled into matchfile in hashtables of table `srcpart`(second small table)
> Finally, when reProcessBigTable, big table rows in matchfile are only read from `firstSmallTable`,
some datum are missing.
>
> *WORKAROUND*
> configure firstSmallTable in completeInitializationOp and only spill big table row
into firstSmallTable when spill matchfile.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
|