hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HADOOP-15209) PoC: DistCp to eliminate needless deletion of files under deleted directories
Date Mon, 05 Feb 2018 18:33:00 GMT
Steve Loughran created HADOOP-15209:
---------------------------------------

             Summary: PoC: DistCp to eliminate needless deletion of files under deleted directories
                 Key: HADOOP-15209
                 URL: https://issues.apache.org/jira/browse/HADOOP-15209
             Project: Hadoop Common
          Issue Type: Improvement
          Components: tools/distcp
    Affects Versions: 2.9.0
            Reporter: Steve Loughran


DistCP issues a delete(file) request even if is underneath an already deleted directory. This
generates needless load on filesystems/object stores, and, if the store throttles delete,
can dramatically slow down the delete operation.

If the distcp delete operation can build a history of deleted directories, then it will know
when it does not need to issue those deletes.

Care is needed here to make sure that whatever structure is created does not overload the
heap of the process.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message