hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hive QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-20260) NDV of a column shouldn't be scaled when row count is changed by filter on another column
Date Wed, 01 Aug 2018 18:18:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-20260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16565758#comment-16565758
] 

Hive QA commented on HIVE-20260:
--------------------------------

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  0s{color} |
{color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 10m  4s{color}
| {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 15s{color} |
{color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 41s{color}
| {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m 37s{color} | {color:blue}
ql in master has 2301 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 10s{color} |
{color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 41s{color}
| {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  2s{color} |
{color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  2s{color} | {color:green}
the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 40s{color} | {color:red}
ql: The patch generated 3 new + 21 unchanged - 28 fixed = 24 total (was 49) {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  0s{color} | {color:red}
The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>.
Refer https://git-scm.com/docs/git-apply {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  0s{color} | {color:red}
The patch 4 line(s) with tabs. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  5m  0s{color} | {color:red}
ql generated 7 new + 2294 unchanged - 7 fixed = 2301 total (was 2301) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  7s{color} |
{color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 14s{color}
| {color:green} The patch does not generate ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 28m 14s{color} | {color:black}
{color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:ql |
|  |  Boxing/unboxing to parse a primitive org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateComparator(Statistics,
AnnotateStatsProcCtx, ExprNodeGenericFuncDesc, long)  At StatsRulesProcFactory.java:org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateComparator(Statistics,
AnnotateStatsProcCtx, ExprNodeGenericFuncDesc, long)  At StatsRulesProcFactory.java:[line
935] |
|  |  Boxing/unboxing to parse a primitive org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateComparator(Statistics,
AnnotateStatsProcCtx, ExprNodeGenericFuncDesc, long)  At StatsRulesProcFactory.java:org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateComparator(Statistics,
AnnotateStatsProcCtx, ExprNodeGenericFuncDesc, long)  At StatsRulesProcFactory.java:[line
956] |
|  |  org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateComparator(Statistics,
AnnotateStatsProcCtx, ExprNodeGenericFuncDesc, long) invokes inefficient new Byte(String)
constructor; use Byte.valueOf(String) instead  At StatsRulesProcFactory.java:inefficient new
Byte(String) constructor; use Byte.valueOf(String) instead  At StatsRulesProcFactory.java:[line
891] |
|  |  org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateComparator(Statistics,
AnnotateStatsProcCtx, ExprNodeGenericFuncDesc, long) invokes inefficient new Integer(String)
constructor; use Integer.valueOf(String) instead  At StatsRulesProcFactory.java:inefficient
new Integer(String) constructor; use Integer.valueOf(String) instead  At StatsRulesProcFactory.java:[line
935] |
|  |  org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateComparator(Statistics,
AnnotateStatsProcCtx, ExprNodeGenericFuncDesc, long) invokes inefficient new Long(String)
constructor; use Long.valueOf(String) instead  At StatsRulesProcFactory.java:inefficient new
Long(String) constructor; use Long.valueOf(String) instead  At StatsRulesProcFactory.java:[line
956] |
|  |  org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateComparator(Statistics,
AnnotateStatsProcCtx, ExprNodeGenericFuncDesc, long) invokes inefficient new Short(String)
constructor; use Short.valueOf(String) instead  At StatsRulesProcFactory.java:inefficient
new Short(String) constructor; use Short.valueOf(String) instead  At StatsRulesProcFactory.java:[line
910] |
|  |  Comparison of String objects using == or != in org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateComparator(Statistics,
AnnotateStatsProcCtx, ExprNodeGenericFuncDesc, long)   At StatsRulesProcFactory.java:== or
!= in org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateComparator(Statistics,
AnnotateStatsProcCtx, ExprNodeGenericFuncDesc, long)   At StatsRulesProcFactory.java:[line
931] |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03)
x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-12984/dev-support/hive-personality.sh
|
| git revision | master / 4d43695 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | http://104.198.109.242/logs//PreCommit-HIVE-Build-12984/yetus/diff-checkstyle-ql.txt
|
| whitespace | http://104.198.109.242/logs//PreCommit-HIVE-Build-12984/yetus/whitespace-eol.txt
|
| whitespace | http://104.198.109.242/logs//PreCommit-HIVE-Build-12984/yetus/whitespace-tabs.txt
|
| findbugs | http://104.198.109.242/logs//PreCommit-HIVE-Build-12984/yetus/new-findbugs-ql.html
|
| modules | C: ql U: ql |
| Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-12984/yetus.txt |
| Powered by | Apache Yetus    http://yetus.apache.org |


This message was automatically generated.



> NDV of a column shouldn't be scaled when row count is changed by filter on another column
> -----------------------------------------------------------------------------------------
>
>                 Key: HIVE-20260
>                 URL: https://issues.apache.org/jira/browse/HIVE-20260
>             Project: Hive
>          Issue Type: Improvement
>          Components: Statistics
>            Reporter: Ashutosh Chauhan
>            Assignee: Zoltan Haindrich
>            Priority: Major
>         Attachments: HIVE-20260.01.patch, HIVE-20260.01wip01.patch, HIVE-20260.01wip02.patch,
HIVE-20260.01wip03.patch
>
>
> HIVE-17465 introduced progressive scaling of rowcounts in presence of multiple filters.
HIVE-19500 improved on that by also scaling col stats (NDV) in such scenario. However, it
should pay attention to column used in filter expression and not scale for all filters. eg.,
> consider filter a = 1 and b = 2 ndv of column b should not be scaled down by row count
changes caused by a = 1
> Other way to say this that ndv of a particular column should be updated at the end of
computation of row count for that operator.
> Here are the possible cases where our estimates can be accurate (or close to)
> {code}
> case 1 - (d_year = 2001 and d_moy=1)
> case 2 - (d_year = 2001 and d_year IN (2001, 2002))
> case 3 - (d_year = 2001 and d_moy = 1 and d_dom = 1)
> case 4 - (d_date IN ('1999-01-02', '1999-01-02'))
> case 5 - (d_date = '1999-01-01')
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message