hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergey Shelukhin (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-19830) Inconsistent behavior when multiple partitions point to the same location
Date Sat, 09 Jun 2018 00:00:36 GMT

    [ https://issues.apache.org/jira/browse/HIVE-19830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16506717#comment-16506717
] 

Sergey Shelukhin commented on HIVE-19830:
-----------------------------------------

The 2nd issue is by design. Having partitions like that for Hive is not supported for regular
tables.
It's assumed that data is managed by Hive and so Hive deletes the directory when it's dropped...
for the case where Hive should not manage the data, an external table should be used.
The first one does look like it could be a bug for external tables, but again for regular
tables such a use case is not supported.

> Inconsistent behavior when multiple partitions point to the same location
> -------------------------------------------------------------------------
>
>                 Key: HIVE-19830
>                 URL: https://issues.apache.org/jira/browse/HIVE-19830
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>    Affects Versions: 2.4.0
>            Reporter: Gabor Kaszab
>            Assignee: Adam Szita
>            Priority: Major
>
> // create a table with 2 partitions where both partitions share the same location and
inserting a single line to one of them.
> create table test (i int) partitioned by (j int) stored as parquet;
> alter table test add partition (j=1) location 'hdfs://localhost:20500/test-warehouse/test/j=1';
> alter table test add partition (j=2) location 'hdfs://localhost:20500/test-warehouse/test/j=1';
> insert into table test partition (j=1) values (1);
> // select * show this single line in both partitions as expected.
> select * from test;
> 1 1
> 1 2
> // however, sum() doesn't add up the line for all the partitions. This is +Issue #1+.
> select sum( i), sum(j) from test;
> 1 2
> // On the file system there is a common dir for the 2 partitions that is expected.
> hdfs dfs -ls hdfs://localhost:20500/test-warehouse/test/
> Found 1 items
> drwxr-xr-x - gaborkaszab supergroup 0 2018-06-08 10:54 hdfs://localhost:20500/test-warehouse/test/j=1
> // Let's drop one of the partitions now!
> alter table test drop partition (j=2);
> // running the same hdfs dfs -ls command shows that the j=1 directory is dropped. I think
this is a good behavior, we just have to document that this is the expected case.
> // select * from test; returns zero rows, this is still as expected.
> // Even though the dir is dropped j=1 partition is still visible with show partitions.
This is +Issue #2+.
> show partitions test;
> j=1
> After dropping the directory with Hive, when Impala reloads it's partitions it asks Hive
to tell what are the existing partitions. Apparently, Hive sends down a list with j=1 partition
included and then Impala takes it as an existing one and doesn't drop it from Catalog's cache.
Here Hive shouldn't send that partition down. This is +Issue #3+.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message