hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergey Shelukhin (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-19891) inserting into external tables with custom partition directories may cause data loss
Date Thu, 14 Jun 2018 01:47:00 GMT

     [ https://issues.apache.org/jira/browse/HIVE-19891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sergey Shelukhin updated HIVE-19891:
------------------------------------
    Description: 
tbl1 is just used as a prop to create data, could be an existing directory for an external
table.
Due to weird behavior of LoadTableDesc (some ancient code for overriding old partition path),
custom partition path is overwritten after the query and the data in it ceases being a part
of the table (can be seen in desc formatted output with masking commented out in QTestUtil)
This affects branch-1 too, so it's pretty old.

{noformat}drop table tbl1;
CREATE TABLE tbl1 (index int, value int ) PARTITIONED BY ( created_date string );
insert into tbl1 partition(created_date='2018-02-01') VALUES (2, 2);

CREATE external TABLE tbl2 (index int, value int ) PARTITIONED BY ( created_date string );
ALTER TABLE tbl2 ADD PARTITION(created_date='2018-02-01');
ALTER TABLE tbl2 PARTITION(created_date='2018-02-01') SET LOCATION 'file:/Users/sergey/git/hivegit/itests/qtest/target/warehouse/tbl1/created_date=2018-02-01';
select * from tbl2;
describe formatted tbl2 partition(created_date='2018-02-01');
insert into tbl2 partition(created_date='2018-02-01') VALUES (1, 1);
select * from tbl2;
describe formatted tbl2 partition(created_date='2018-02-01');
{noformat}


> inserting into external tables with custom partition directories may cause data loss
> ------------------------------------------------------------------------------------
>
>                 Key: HIVE-19891
>                 URL: https://issues.apache.org/jira/browse/HIVE-19891
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>            Priority: Major
>
> tbl1 is just used as a prop to create data, could be an existing directory for an external
table.
> Due to weird behavior of LoadTableDesc (some ancient code for overriding old partition
path), custom partition path is overwritten after the query and the data in it ceases being
a part of the table (can be seen in desc formatted output with masking commented out in QTestUtil)
> This affects branch-1 too, so it's pretty old.
> {noformat}drop table tbl1;
> CREATE TABLE tbl1 (index int, value int ) PARTITIONED BY ( created_date string );
> insert into tbl1 partition(created_date='2018-02-01') VALUES (2, 2);
> CREATE external TABLE tbl2 (index int, value int ) PARTITIONED BY ( created_date string
);
> ALTER TABLE tbl2 ADD PARTITION(created_date='2018-02-01');
> ALTER TABLE tbl2 PARTITION(created_date='2018-02-01') SET LOCATION 'file:/Users/sergey/git/hivegit/itests/qtest/target/warehouse/tbl1/created_date=2018-02-01';
> select * from tbl2;
> describe formatted tbl2 partition(created_date='2018-02-01');
> insert into tbl2 partition(created_date='2018-02-01') VALUES (1, 1);
> select * from tbl2;
> describe formatted tbl2 partition(created_date='2018-02-01');
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message