hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergey Shelukhin (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-20109) get rid of COLUMN_STATS_ACCURATE
Date Sat, 04 Aug 2018 03:00:00 GMT

     [ https://issues.apache.org/jira/browse/HIVE-20109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sergey Shelukhin updated HIVE-20109:
------------------------------------
    Attachment: HIVE-20109.patch
                HIVE-20109.nogen.patch

> get rid of COLUMN_STATS_ACCURATE
> --------------------------------
>
>                 Key: HIVE-20109
>                 URL: https://issues.apache.org/jira/browse/HIVE-20109
>             Project: Hive
>          Issue Type: Bug
>          Components: Statistics
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>            Priority: Major
>         Attachments: HIVE-20109.nogen.patch, HIVE-20109.patch
>
>
> I don't know why anyone would come up with an idea of storing a set of booleans in a
database using JSON. This has caused various problems in the past (text field limitations,
perf issues when parsing a giant string; also bugs because the way it is set is brittle).
> However, now that we are implementing transactional stats, it becomes especially problematic
and error prone because the code in Hive sets C_S_A in random places with reckless abandon,
whereas we want to change the state of the stats in well defined places where txn semantics
can be verified.
> Currently in HIVE-19416, we are handling random things that touch it (from metastore
itself to output committers, various stats tasks, commands like truncate, etc.) via a pile
of hacks, but the best solution would be to remove it completely and replace with a DB table/columns
in stats tables that would need to be set explicitly, not via generic alter_table.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message