hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hive QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-20260) NDV of a column shouldn't be scaled when row count is changed by filter on another column
Date Wed, 01 Aug 2018 18:50:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-20260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16565809#comment-16565809
] 

Hive QA commented on HIVE-20260:
--------------------------------



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12933944/HIVE-20260.01.patch

{color:green}SUCCESS:{color} +1 due to 4 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 14839 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.metastore.TestMarkPartitionRemote.testMarkingPartitionSet (batchId=228)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/12984/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/12984/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-12984/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12933944 - PreCommit-HIVE-Build

> NDV of a column shouldn't be scaled when row count is changed by filter on another column
> -----------------------------------------------------------------------------------------
>
>                 Key: HIVE-20260
>                 URL: https://issues.apache.org/jira/browse/HIVE-20260
>             Project: Hive
>          Issue Type: Improvement
>          Components: Statistics
>            Reporter: Ashutosh Chauhan
>            Assignee: Zoltan Haindrich
>            Priority: Major
>         Attachments: HIVE-20260.01.patch, HIVE-20260.01wip01.patch, HIVE-20260.01wip02.patch,
HIVE-20260.01wip03.patch
>
>
> HIVE-17465 introduced progressive scaling of rowcounts in presence of multiple filters.
HIVE-19500 improved on that by also scaling col stats (NDV) in such scenario. However, it
should pay attention to column used in filter expression and not scale for all filters. eg.,
> consider filter a = 1 and b = 2 ndv of column b should not be scaled down by row count
changes caused by a = 1
> Other way to say this that ndv of a particular column should be updated at the end of
computation of row count for that operator.
> Here are the possible cases where our estimates can be accurate (or close to)
> {code}
> case 1 - (d_year = 2001 and d_moy=1)
> case 2 - (d_year = 2001 and d_year IN (2001, 2002))
> case 3 - (d_year = 2001 and d_moy = 1 and d_dom = 1)
> case 4 - (d_date IN ('1999-01-02', '1999-01-02'))
> case 5 - (d_date = '1999-01-01')
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message