hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hive QA (Jira)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-23438) Missing Rows When Left Outer Join In N-way HybridGraceHashJoin
Date Tue, 12 May 2020 04:12:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-23438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105045#comment-17105045
] 

Hive QA commented on HIVE-23438:
--------------------------------



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/13002664/HIVE-23438.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/22276/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/22276/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-22276/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit status 1 and
output '+ date '+%Y-%m-%d %T.%3N'
2020-05-12 04:10:16.427
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-22276/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2020-05-12 04:10:16.430
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at ee4daec HIVE-23414: Detail Hive Java Compatibility (David Mollitor, reviewed
by Naveen Gangam)
+ git clean -f -d
Removing standalone-metastore/metastore-server/src/gen/
+ git checkout master
Already on 'master'
Your branch is up-to-date with 'origin/master'.
+ git reset --hard origin/master
HEAD is now at ee4daec HIVE-23414: Detail Hive Java Compatibility (David Mollitor, reviewed
by Naveen Gangam)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2020-05-12 04:10:18.042
+ rm -rf ../yetus_PreCommit-HIVE-Build-22276
+ mkdir ../yetus_PreCommit-HIVE-Build-22276
+ git gc
+ cp -R . ../yetus_PreCommit-HIVE-Build-22276
+ mkdir /data/hiveptest/logs/PreCommit-HIVE-Build-22276/yetus
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh /data/hiveptest/working/scratch/build.patch
Trying to apply the patch with -p0
error: a/data/scripts/q_test_init_tez.sql: does not exist in index
error: a/pom.xml: does not exist in index
error: a/ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java: does not exist in
index
error: a/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java: does
not exist in index
error: a/ql/src/test/queries/clientpositive/hybridgrace_hashjoin_2.q: does not exist in index
error: a/ql/src/test/results/clientpositive/tez/hybridgrace_hashjoin_2.q.out: does not exist
in index
Trying to apply the patch with -p1
error: patch failed: data/scripts/q_test_init_tez.sql:26
Falling back to three-way merge...
Applied patch to 'data/scripts/q_test_init_tez.sql' with conflicts.
error: patch failed: pom.xml:1065
Falling back to three-way merge...
Applied patch to 'pom.xml' with conflicts.
error: patch failed: ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java:238
Falling back to three-way merge...
Applied patch to 'ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java'
with conflicts.
error: patch failed: ql/src/test/results/clientpositive/tez/hybridgrace_hashjoin_2.q.out:1423
Falling back to three-way merge...
Applied patch to 'ql/src/test/results/clientpositive/tez/hybridgrace_hashjoin_2.q.out' cleanly.
Going to apply patch with: git apply -p1
error: patch failed: data/scripts/q_test_init_tez.sql:26
Falling back to three-way merge...
Applied patch to 'data/scripts/q_test_init_tez.sql' with conflicts.
error: patch failed: pom.xml:1065
Falling back to three-way merge...
Applied patch to 'pom.xml' with conflicts.
error: patch failed: ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java:238
Falling back to three-way merge...
Applied patch to 'ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java'
with conflicts.
error: patch failed: ql/src/test/results/clientpositive/tez/hybridgrace_hashjoin_2.q.out:1423
Falling back to three-way merge...
Applied patch to 'ql/src/test/results/clientpositive/tez/hybridgrace_hashjoin_2.q.out' cleanly.
U data/scripts/q_test_init_tez.sql
U pom.xml
U ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java
+ result=1
+ '[' 1 -ne 0 ']'
+ rm -rf yetus_PreCommit-HIVE-Build-22276
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 13002664 - PreCommit-HIVE-Build

> Missing Rows When Left Outer Join In N-way HybridGraceHashJoin
> --------------------------------------------------------------
>
>                 Key: HIVE-23438
>                 URL: https://issues.apache.org/jira/browse/HIVE-23438
>             Project: Hive
>          Issue Type: Bug
>          Components: SQL, Tez
>    Affects Versions: 2.3.4
>            Reporter: 范宜臻
>            Priority: Major
>         Attachments: HIVE-23438.patch
>
>
> *Run Test in Patch File*
> {code:java}
> mvn test -Dtest=TestMiniTezCliDriver -Dqfile=hybridgrace_hashjoin_2.q{code}
> *Manual Reproduce*
> *STEP 1. Create test data(q_test_init_tez.sql)*
> {code:java}
> //create table src1
> CREATE TABLE src1 (key STRING COMMENT 'default', value STRING COMMENT 'default') STORED
AS TEXTFILE;
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv3.txt" INTO TABLE src1;
> //create table src2
> CREATE TABLE src2(key STRING COMMENT 'default', value STRING COMMENT 'default') STORED
AS TEXTFILE;
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv11.txt" OVERWRITE INTO TABLE src2;
> //create table srcpart
> CREATE TABLE srcpart (key STRING COMMENT 'default', value STRING COMMENT 'default')
> PARTITIONED BY (ds STRING, hr STRING)
> STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-08", hr="11");
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-08", hr="12");
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-09", hr="11");
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-09", hr="12");{code}
> *STEP 2. Run query*
> {code:java}
> set hive.auto.convert.join=true; 
> set hive.auto.convert.join.noconditionaltask=true; 
> set hive.auto.convert.join.noconditionaltask.size=10000000; 
> set hive.cbo.enable=false;
> set hive.mapjoin.hybridgrace.hashtable=true;
> select *
> from
> (
> select key from src1 group by key
> ) x
> left join src2 z on x.key = z.key
> join
> (
> select key from srcpart y group by key
> ) y on y.key = x.key;
> {code}
> *EXPECTED RESULT***
>  
> {code:java}
> 128	NULL	NULL	128
> 146	146	1val_1461	146
> 150	150	1val_1501	150
> 238	NULL	NULL	238
> 369	NULL	NULL	369
> 406	406	1val_4061	406
> 273	273	1val_2731	273
> 98	NULL	NULL	98
> 213	213	1val_2131	213
> 255	NULL	NULL	255
> 401	401	1val_4011	401
> 278	NULL	NULL	278
> 66	66	11val_6611	66
> 224	NULL	NULL	224
> 311	NULL	NULL	311
> {code}
>  
> *ACTUAL RESULT*
> {code:java}
> 128	NULL	NULL	128
> 146	146	1val_1461	146
> 150	150	1val_1501	150
> 213	213	1val_2131	213
> 238	NULL	NULL	238
> 273	273	1val_2731	273
> 369	NULL	NULL	369
> 406	406	1val_4061	406
> 98	NULL	NULL	98
> 401	401	1val_4011	401
> 66	66	11val_6611	66
> {code}
>  
> *ROOT CAUSE*
> src1 left join src2, src1 is big table and src2 is small table. Join result between big
table row and the corresponding hashtable maybe NO_MATCH state, however, these NO_MATCH rows
is needed because LEFT OUTER JOIN.
> In addition, these big table rows will not spilled into matchfile related to this hashtable
on disk because only SPILL state can use `spillBigTableRow`.  Then, these big table rows
will be spilled into matchfile in hashtables of table `srcpart`(second small table)
> Finally, when reProcessBigTable, big table rows in matchfile are only read from `firstSmallTable`,
some datum are missing.
>  
> *WORKAROUND*
>  configure firstSmallTable in completeInitializationOp and only spill big table row
into firstSmallTable when spill matchfile.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message