hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jesus Camacho Rodriguez (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-16957) Support CTAS for auto gather column stats
Date Wed, 12 Dec 2018 01:49:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-16957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16718351#comment-16718351
] 

Jesus Camacho Rodriguez commented on HIVE-16957:
------------------------------------------------

ALTER MV... REBUILD is working correctly. When incremental rebuild translates into a MERGE
operation, i.e. MV contains a Group By statement, column stats are not present because the
MERGE contains in turn an UPDATE operation, which currently invalidates column stats. When
incremental rebuild translates into INSERT operation, i.e., MV does not contain a Group By
statement, column stats for the MV are updated correctly.

> Support CTAS for auto gather column stats
> -----------------------------------------
>
>                 Key: HIVE-16957
>                 URL: https://issues.apache.org/jira/browse/HIVE-16957
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Pengcheng Xiong
>            Assignee: Jesus Camacho Rodriguez
>            Priority: Major
>         Attachments: HIVE-16957.patch
>
>
> The idea is to rely as much as possible on the logic in ColumnStatsSemanticAnalyzer as
other operations do. In particular, they create a 'analyze table t compute statistics for
columns', use ColumnStatsSemanticAnalyzer to parse it, and connect resulting plan to existing
INSERT/INSERT OVERWRITE statement. The challenge for CTAS or CREATE MATERIALIZED VIEW is that
the table object does not exist yet, hence we cannot rely fully on ColumnStatsSemanticAnalyzer.
> Thus, we use same process, but ColumnStatsSemanticAnalyzer produces a statement for column
stats collection that uses a table values clause instead of the original table reference:
> {code}
> select compute_stats(col1), compute_stats(col2), compute_stats(col3)
> from table(values(cast(null as int), cast(null as int), cast(null as string))) as t(col1,
col2, col3);
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message