hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Igor Dvorzhak (Jira)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-25277) Slow Hive partition deletion for Cloud object stores with expensive ListFiles
Date Thu, 15 Jul 2021 15:36:00 GMT

     [ https://issues.apache.org/jira/browse/HIVE-25277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Igor Dvorzhak updated HIVE-25277:
---------------------------------
    Target Version/s: 2.3.9, 3.1.3, 4.0.0  (was: 2.3.6, 2.3.7, 3.1.2)

> Slow Hive partition deletion for Cloud object stores with expensive ListFiles
> -----------------------------------------------------------------------------
>
>                 Key: HIVE-25277
>                 URL: https://issues.apache.org/jira/browse/HIVE-25277
>             Project: Hive
>          Issue Type: Improvement
>          Components: Standalone Metastore
>    Affects Versions: All Versions
>            Reporter: Zhou Fang
>            Assignee: Zhou Fang
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> Deleting a Hive partition is slow when use a Cloud object store as the warehouse for
which ListFiles is expensive. A root cause is that the recursive parent dir deletion is very
inefficient: there are many duplicated calls to isEmpty (ListFiles is called at the end).
This fix sorts the parents to delete according to the path size, and always processes the
longest one (e.g., a/b/c is always before a/b). As a result, each parent path is only needed
to be checked once.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message