hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vihang Karajgaonkar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-20198) Constant time table drops/renames
Date Wed, 18 Jul 2018 17:15:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-20198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16548105#comment-16548105
] 

Vihang Karajgaonkar commented on HIVE-20198:
--------------------------------------------

+1 to the idea. In case of managed table when impersonation is turned off the owner of the
hdfs location for the tables and its partitions is hive. I think it makes sense to let Hive
manage the table location for a managed_table. Having a UUID based table location can provide
lot of performance advantages in terms of async drops and renames.

In case of external tables, the hdfs data is anyways not deleted currently. So this approach
can be used in external tables as well. A drop command on an external table should do a quick
metadata operation which marks the table as deleted (may be by prefixing the table name with
a special string + its uuid).

Surprisingly, in case of table renames on external tables, I see that HMS does not update
the table in the partitions for that table (which seems wrong to me).

> Constant time table drops/renames
> ---------------------------------
>
>                 Key: HIVE-20198
>                 URL: https://issues.apache.org/jira/browse/HIVE-20198
>             Project: Hive
>          Issue Type: Improvement
>          Components: Metastore
>    Affects Versions: 4.0.0
>            Reporter: Alexander Kolbasov
>            Priority: Major
>
> Currently table drops and table renames have O(P) performance (where P is the number
of partitions). When a managed table is deleted, the implementation deletes table metadata
and then deletes all partitions in HDFS. HDFS operations are optimized and only do a sequential
deletes for partitions outside of table prefix. This operation is O(P)where Pis the number
of partitions. 
> Table rename goes through the list of partitions and modifies table name (and potentially
db name) in each partition. It also modifies each partition location to match the new db/table
name and renames directories (which is a non-atomic and slow operation on S3). This is O(P)
operation where P is the number of partitions.
> Basic idea is to do the following:
> # Assign unique ID to each table
> # Create directory name based on unique ID rather then the name
> # Table rename then becomes metadata-only operation - there is no need to change any
location information.
> # Table drop can become an asynchronous operation where the table is marked as "deleted".
Subsequent public metadata APIs should skip such tables. A background cleaner thread may then
go and clean up directories.
> Since the table location is unique for each table, new tables will not reuse existing
locations. This change isn't compatible with the current behavior where there is an assumption
that table location is based on table name. We can get around this by providing "opt-in" mechanism
- special table property that tells that the table can have such new behavior, so the improvement
will initially work for new tables created with this feature enabled. We may later provide
some tool to convert existing tables to the new scheme.
> One complication is there in case where impersonation is enabled - the FS operations
should be performed using client UGI rather then server's, so the cleaner thread should be
able to use client UGIs.
> Initially we can punt on this and do standard table drops when impersonation is enabled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message