hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pau Tallada Crespí (JIRA) <j...@apache.org>
Subject [jira] [Commented] (HIVE-12895) Bucket files not renamed with multiple insert overwrite table statements
Date Thu, 28 Jun 2018 13:31:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-12895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16526304#comment-16526304
] 

Pau Tallada Crespí commented on HIVE-12895:
-------------------------------------------

Hi,

Any progress on this?

We just hit the bug doing a single INSERT OVERWRITE into a dynamically partitioned table

Tbl: PARTITIONED BY (col1) CLUSTERED BY (col2) INTO 2048 BUCKETS

INSERT OVERWRITE TABLE Tbl PARTITION (col1)
SELECT a, b, c, col2, col1
FROM other_table
JOIN another_table
ON condition
WHERE criteria
DISTRIBUTE BY col1;

 

> Bucket files not renamed with multiple insert overwrite table statements
> ------------------------------------------------------------------------
>
>                 Key: HIVE-12895
>                 URL: https://issues.apache.org/jira/browse/HIVE-12895
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.14.0
>            Reporter: Charles Pritchard
>            Priority: Major
>
> With two tables that have different cluster by columns, using multiple INSERT OVERWRITE
TABLE syntax results in the output files of one of the tables being named "_bucket_number_0"
which should have clearly been renamed to the usual "00000_0" style. The temporary filename
is not picked up for later selects, making this a more urgent issue.
> This is with:
> Tbl1: CLUSTERED BY (col1) SORTED BY(col1) INTO 1 BUCKETS;
> Tbl2: CLUSTERED BY (col2) SORTED BY(col2) INTO 1 BUCKETS;
> FROM statement
> INSERT OVERWRITE TABLE tbl1 select...
> INSERT OVERWRITE TABLE tbl2 select...;



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message