hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hive QA (JIRA)" <>
Subject [jira] [Commented] (HIVE-11266) count(*) wrong result based on table statistics for external tables
Date Sat, 07 Oct 2017 16:12:02 GMT


Hive QA commented on HIVE-11266:

Here are the results of testing the latest attachment:

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 11190 tests executed
*Failed tests:*
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[optimize_nullscan] (batchId=162)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query23] (batchId=239)

Test results:
Console output:
Test logs:

Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed

This message is automatically generated.

ATTACHMENT ID: 12890844 - PreCommit-HIVE-Build

> count(*) wrong result based on table statistics for external tables
> -------------------------------------------------------------------
>                 Key: HIVE-11266
>                 URL:
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 1.1.0
>            Reporter: Simone Battaglia
>            Assignee: Jesus Camacho Rodriguez
>            Priority: Blocker
>         Attachments: HIVE-11266.patch
> Hive returns wrong count result on an external table with table statistics if I change
table data files.
> This is the scenario in details:
> 1) create external table my_table (...) location 'my_location';
> 2) analyze table my_table compute statistics;
> 3) change/add/delete one or more files in 'my_location' directory;
> 4) select count(\*) from my_table;
> In this case the count query doesn't generate a MR job and returns the result based on
table statistics. This result is wrong because is based on statistics stored in the Hive metastore
and doesn't take into account modifications introduced on data files.
> Obviously setting "hive.compute.query.using.stats" to FALSE this problem doesn't occur
but the default value of this property is TRUE.
> I thinks that also this post on stackoverflow, that shows another type of bug in case
of multiple insert, is related to the one that I reported:

This message was sent by Atlassian JIRA

View raw message