hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hive QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-19097) related equals and in operators may cause inaccurate stats estimations
Date Wed, 01 Aug 2018 17:42:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-19097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16565696#comment-16565696
] 

Hive QA commented on HIVE-19097:
--------------------------------



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12933945/HIVE-19097.06wip01.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/12983/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/12983/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-12983/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit status 1 and
output '+ date '+%Y-%m-%d %T.%3N'
2018-08-01 17:40:40.966
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-12983/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2018-08-01 17:40:40.970
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at 4d43695 HIVE-18201 : Disable XPROD_EDGE for sq_count_check()  created for scalar
subqueries (Ashutosh Chauhan via Jesus Camacho Rodriguez)
+ git clean -f -d
Removing standalone-metastore/metastore-server/src/gen/
+ git checkout master
Already on 'master'
Your branch is up-to-date with 'origin/master'.
+ git reset --hard origin/master
HEAD is now at 4d43695 HIVE-18201 : Disable XPROD_EDGE for sq_count_check()  created for scalar
subqueries (Ashutosh Chauhan via Jesus Camacho Rodriguez)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2018-08-01 17:40:42.382
+ rm -rf ../yetus_PreCommit-HIVE-Build-12983
+ mkdir ../yetus_PreCommit-HIVE-Build-12983
+ git gc
+ cp -R . ../yetus_PreCommit-HIVE-Build-12983
+ mkdir /data/hiveptest/logs/PreCommit-HIVE-Build-12983/yetus
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh /data/hiveptest/working/scratch/build.patch
error: patch failed: ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java:3525
Falling back to three-way merge...
Applied patch to 'ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java'
cleanly.
error: patch failed: ql/src/test/results/clientpositive/cbo_rp_simple_select.q.out:863
Falling back to three-way merge...
Applied patch to 'ql/src/test/results/clientpositive/cbo_rp_simple_select.q.out' with conflicts.
error: patch failed: ql/src/test/results/clientpositive/cbo_simple_select.q.out:863
Falling back to three-way merge...
Applied patch to 'ql/src/test/results/clientpositive/cbo_simple_select.q.out' with conflicts.
error: patch failed: ql/src/test/results/clientpositive/list_bucket_query_multiskew_2.q.out:261
Falling back to three-way merge...
Applied patch to 'ql/src/test/results/clientpositive/list_bucket_query_multiskew_2.q.out'
with conflicts.
Going to apply patch with: git apply -p0
/data/hiveptest/working/scratch/build.patch:2022: trailing whitespace.
        Map 10 
/data/hiveptest/working/scratch/build.patch:2055: trailing whitespace.
        Map 8 
/data/hiveptest/working/scratch/build.patch:2120: trailing whitespace.
        Map 1 
/data/hiveptest/working/scratch/build.patch:2163: trailing whitespace.
        Map 6 
/data/hiveptest/working/scratch/build.patch:2171: trailing whitespace.
        Map 7 
error: patch failed: ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java:3525
Falling back to three-way merge...
Applied patch to 'ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java'
cleanly.
error: patch failed: ql/src/test/results/clientpositive/cbo_rp_simple_select.q.out:863
Falling back to three-way merge...
Applied patch to 'ql/src/test/results/clientpositive/cbo_rp_simple_select.q.out' with conflicts.
error: patch failed: ql/src/test/results/clientpositive/cbo_simple_select.q.out:863
Falling back to three-way merge...
Applied patch to 'ql/src/test/results/clientpositive/cbo_simple_select.q.out' with conflicts.
error: patch failed: ql/src/test/results/clientpositive/list_bucket_query_multiskew_2.q.out:261
Falling back to three-way merge...
Applied patch to 'ql/src/test/results/clientpositive/list_bucket_query_multiskew_2.q.out'
with conflicts.
U ql/src/test/results/clientpositive/cbo_rp_simple_select.q.out
U ql/src/test/results/clientpositive/cbo_simple_select.q.out
U ql/src/test/results/clientpositive/list_bucket_query_multiskew_2.q.out
warning: squelched 32 whitespace errors
warning: 37 lines add whitespace errors.
+ result=1
+ '[' 1 -ne 0 ']'
+ rm -rf yetus_PreCommit-HIVE-Build-12983
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12933945 - PreCommit-HIVE-Build

> related equals and in operators may cause inaccurate stats estimations
> ----------------------------------------------------------------------
>
>                 Key: HIVE-19097
>                 URL: https://issues.apache.org/jira/browse/HIVE-19097
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Zoltan Haindrich
>            Assignee: Zoltan Haindrich
>            Priority: Major
>         Attachments: HIVE-19097.01.patch, HIVE-19097.02.patch, HIVE-19097.03.patch, HIVE-19097.04.patch,
HIVE-19097.05.patch, HIVE-19097.06wip01.patch, HIVE-19097.partial.patch
>
>
> tpcds#74 is optimized in a way that for date_dim the condition contains IN and = for
the same column
> {code:java}
> |             Map Operator Tree:                     |
> |                 TableScan                          |
> |                   alias: date_dim                  |
> |                   filterExpr: (((d_year) IN (2001, 2002) and (d_year = 2002) and d_date_sk
is not null) or ((d_year) IN (2001, 2002) and (d_year = 2001) and d_date_sk is not null))
(type: boolean) |
> |                   Statistics: Num rows: 73049 Data size: 876588 Basic stats: COMPLETE
Column stats: COMPLETE |
> |                   Filter Operator                  |
> |                     predicate: ((d_year) IN (2001, 2002) and (d_year = 2002) and d_date_sk
is not null) (type: boolean) |
> |                     Statistics: Num rows: 4 Data size: 48 Basic stats: COMPLETE Column
stats: COMPLETE |
> {code}
> the "real" row count will be 365
> for separate {{IN}} and {{=}} the estimation is very good; but if both are present it
becomes (very) underestimated.
> {code:java}
> set hive.query.results.cache.enabled=false;
> drop table if exists t1;
> drop table if exists t8;
> create table t1 (a integer,b integer);
> create table t8 like t1;
> insert into t1 values (1,1),(2,2),(3,3),(4,4),(5,5);
> insert into t8
> select * from t1 union all select * from t1 union all select * from t1 union all select
* from t1 union all
> select * from t1 union all select * from t1 union all select * from t1 union all select
* from t1
> ;
> analyze table t1 compute statistics for columns;
> analyze table t8 compute statistics for columns;
> explain analyze select sum(a) from t8 where b in (2,3) group by b;
> explain analyze select sum(a) from t8 where b=2 group by b;
> explain analyze select sum(a) from t1 where b in (2,3) and b=2 group by b;
> explain analyze select sum(a) from t8 where b in (2,3) and b=2 group by b;
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message