hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-15191) Add Private/Unstable BulkDelete operations to supporting object stores for DistCP
Date Fri, 02 Feb 2018 05:48:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-15191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349819#comment-16349819
] 

Steve Loughran commented on HADOOP-15191:
-----------------------------------------

Trace of a run. I'd expected the missing files to be queued for bulk, which they are, but
lots of directory deletions kick off too. This means the bulk ops aren't needed, and indeed
the attempt to be clever there and create parent dirs wasted. 
{code}
2018-02-01 21:44:44,756 [Thread-124] INFO  mapred.CopyCommitter (CopyCommitter.java:deleteMissing(343))
- -delete option is enabled. About to remove entries from target that are missing in source
2018-02-01 21:44:46,064 [Thread-124] INFO  tools.SimpleCopyListing (SimpleCopyListing.java:printStats(608))
- Paths (files+dirs) cnt = 20; dirCnt = 10
2018-02-01 21:44:46,064 [Thread-124] INFO  tools.SimpleCopyListing (SimpleCopyListing.java:doBuildListing(402))
- Build file listing completed.
2018-02-01 21:44:46,080 [Thread-124] INFO  tools.DistCp (CopyListing.java:buildListing(94))
- Number of paths in the copy list: 20
2018-02-01 21:44:46,095 [Thread-124] INFO  tools.DistCp (CopyListing.java:buildListing(94))
- Number of paths in the copy list: 20
2018-02-01 21:44:46,109 [Thread-124] INFO  mapred.CopyCommitter (CopyCommitter.java:deleteMissing(385))
- Listing completed in 0:00:01.352
2018-02-01 21:44:46,109 [Thread-124] INFO  mapred.CopyCommitter (CopyCommitter.java:deleteMissing(405))
- Destination filesystem supports bulk deletes, maximum size 2
2018-02-01 21:44:46,390 [Thread-124] INFO  mapred.CopyCommitter (CopyCommitter.java:deleteMissing(434))
- Deleted directory s3a://hwdev-steve-new/test/ITestS3AContractDistCp/deepDirectoryStructureToRemoteWithSync/outputDir/inputDir
- Missing at source
2018-02-01 21:44:46,390 [Thread-124] INFO  mapred.CopyCommitter (CopyCommitter.java:deleteMissing(446))
- Queueing for bulk delete file s3a://hwdev-steve-new/test/ITestS3AContractDistCp/deepDirectoryStructureToRemoteWithSync/outputDir/inputDir/file1
2018-02-01 21:44:46,507 [Thread-124] INFO  mapred.CopyCommitter (CopyCommitter.java:deleteMissing(434))
- Deleted directory s3a://hwdev-steve-new/test/ITestS3AContractDistCp/deepDirectoryStructureToRemoteWithSync/outputDir/inputDir/subDir1
- Missing at source
2018-02-01 21:44:46,508 [Thread-124] INFO  mapred.CopyCommitter (CopyCommitter.java:deleteMissing(446))
- Queueing for bulk delete file s3a://hwdev-steve-new/test/ITestS3AContractDistCp/deepDirectoryStructureToRemoteWithSync/outputDir/inputDir/subDir1/file2
2018-02-01 21:44:46,508 [Thread-124] INFO  mapred.CopyCommitter (CopyCommitter.java:deleteMissing(450))
- Initiating bulk delete of size 2
2018-02-01 21:44:46,512 [Thread-124] INFO  s3a.S3AFileSystem (S3ABulkOperations.java:lambda$bulkDeleteFiles$0(157))
- Deleting 2 objects
2018-02-01 21:44:46,596 [Thread-124] INFO  s3a.S3AFileSystem (S3ABulkOperations.java:maybeMkParentDirs(228))
- Number of directories to try creating: 1
2018-02-01 21:44:46,784 [Thread-124] INFO  s3a.S3AFileSystem (S3ABulkOperations.java:maybeMkParentDirs(237))
- Number of created directories: 1 
2018-02-01 21:44:46,923 [Thread-124] INFO  mapred.CopyCommitter (CopyCommitter.java:deleteMissing(434))
- Deleted directory s3a://hwdev-steve-new/test/ITestS3AContractDistCp/deepDirectoryStructureToRemoteWithSync/outputDir/inputDir/subDir2
- Missing at source
2018-02-01 21:44:48,013 [Thread-124] INFO  mapred.CopyCommitter (CopyCommitter.java:deleteMissing(434))
- Deleted directory s3a://hwdev-steve-new/test/ITestS3AContractDistCp/deepDirectoryStructureToRemoteWithSync/outputDir/inputDir/subDir2/subDir3
- Missing at source
2018-02-01 21:44:48,014 [Thread-124] INFO  mapred.CopyCommitter (CopyCommitter.java:deleteMissing(446))
- Queueing for bulk delete file s3a://hwdev-steve-new/test/ITestS3AContractDistCp/deepDirectoryStructureToRemoteWithSync/outputDir/inputDir/subDir2/subDir3/file3
2018-02-01 21:44:48,014 [Thread-124] INFO  mapred.CopyCommitter (CopyCommitter.java:deleteMissing(446))
- Queueing for bulk delete file s3a://hwdev-steve-new/test/ITestS3AContractDistCp/deepDirectoryStructureToRemoteWithSync/outputDir/inputDir/subDir2/subDir3/file4
2018-02-01 21:44:48,014 [Thread-124] INFO  mapred.CopyCommitter (CopyCommitter.java:deleteMissing(450))
- Initiating bulk delete of size 2
2018-02-01 21:44:48,014 [Thread-124] INFO  s3a.S3AFileSystem (S3ABulkOperations.java:lambda$bulkDeleteFiles$0(157))
- Deleting 2 objects
2018-02-01 21:44:48,168 [Thread-124] INFO  s3a.S3AFileSystem (S3ABulkOperations.java:maybeMkParentDirs(228))
- Number of directories to try creating: 1
2018-02-01 21:44:48,461 [Thread-124] INFO  s3a.S3AFileSystem (S3ABulkOperations.java:maybeMkParentDirs(237))
- Number of created directories: 1 
2018-02-01 21:44:48,461 [Thread-124] INFO  mapred.CopyCommitter (CopyCommitter.java:deleteMissing(446))
- Queueing for bulk delete file s3a://hwdev-steve-new/test/ITestS3AContractDistCp/deepDirectoryStructureToRemoteWithSync/outputDir/inputDir/subDir2/subDir3/newfile1
2018-02-01 21:44:48,874 [Thread-124] INFO  mapred.CopyCommitter (CopyCommitter.java:deleteMissing(434))
- Deleted directory s3a://hwdev-steve-new/test/ITestS3AContractDistCp/deepDirectoryStructureToRemoteWithSync/outputDir/inputDir/subDir4
- Missing at source
2018-02-01 21:44:48,980 [Thread-124] INFO  mapred.CopyCommitter (CopyCommitter.java:deleteMissing(434))
- Deleted directory s3a://hwdev-steve-new/test/ITestS3AContractDistCp/deepDirectoryStructureToRemoteWithSync/outputDir/inputDir/subDir4/subDir4
- Missing at source
2018-02-01 21:44:48,980 [Thread-124] INFO  mapred.CopyCommitter (CopyCommitter.java:deleteMissing(446))
- Queueing for bulk delete file s3a://hwdev-steve-new/test/ITestS3AContractDistCp/deepDirectoryStructureToRemoteWithSync/outputDir/inputDir/subDir4/subDir4/file4
2018-02-01 21:44:48,980 [Thread-124] INFO  mapred.CopyCommitter (CopyCommitter.java:deleteMissing(450))
- Initiating bulk delete of size 2
2018-02-01 21:44:48,980 [Thread-124] INFO  s3a.S3AFileSystem (S3ABulkOperations.java:lambda$bulkDeleteFiles$0(157))
- Deleting 2 objects
2018-02-01 21:44:49,005 [Thread-124] INFO  s3a.S3AFileSystem (S3ABulkOperations.java:maybeMkParentDirs(228))
- Number of directories to try creating: 2
2018-02-01 21:44:49,281 [Thread-124] INFO  s3a.S3AFileSystem (S3ABulkOperations.java:maybeMkParentDirs(237))
- Number of created directories: 1 
2018-02-01 21:44:49,281 [Thread-124] INFO  mapred.CopyCommitter (CopyCommitter.java:deleteMissing(446))
- Queueing for bulk delete file s3a://hwdev-steve-new/test/ITestS3AContractDistCp/deepDirectoryStructureToRemoteWithSync/outputDir/inputDir/subDir4/subDir4/file5
2018-02-01 21:44:49,282 [Thread-124] INFO  mapred.CopyCommitter (CopyCommitter.java:deleteMissing(467))
- Initiating final bulk delete of size 1
2018-02-01 21:44:49,426 [Thread-124] INFO  mapred.CopyCommitter (CopyCommitter.java:deleteMissing(476))
- Deleted from target: s3a://hwdev-steve-new/test/ITestS3AContractDistCp/deepDirectoryStructureToRemoteWithSync/outputDir
entries: files: 7 directories: 6
2018-02-01 21:44:49,427 [Thread-124] INFO  mapred.CopyCommitter (CopyCommitter.java:deleteMissing(478))
- Time to delete: 0:00:03.317
2018-02-01 21:44:49,427 [Thread-124] INFO  mapred.CopyCommitter (CopyCommitter.java:cleanup(179))
- Cleaning up temporary work folder: file:/tmp/hadoop/mapred/staging/stevel368042071/.staging/_distcp445682291
2018-02-01 21:44:49,516 [Thread-0] INFO  mapreduce.Job (Job.java:monitorAndPrintJob(1658))
- Job job_local237798756_0002 completed successfully
{code}

> Add Private/Unstable BulkDelete operations to supporting object stores for DistCP
> ---------------------------------------------------------------------------------
>
>                 Key: HADOOP-15191
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15191
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3, tools/distcp
>    Affects Versions: 2.9.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Major
>         Attachments: HADOOP-15191-001.patch, HADOOP-15191-002.patch, HADOOP-15191-003.patch,
HADOOP-15191-004.patch
>
>
> Large scale DistCP with the -delete option doesn't finish in a viable time because of
the final CopyCommitter doing a 1 by 1 delete of all missing files. This isn't randomized
(the list is sorted), and it's throttled by AWS.
> If bulk deletion of files was exposed as an API, distCP would do 1/1000 of the REST calls,
so not get throttled.
> Proposed: add an initially private/unstable interface for stores, {{BulkDelete}} which
declares a page size and offers a {{bulkDelete(List<Path>)}} operation for the bulk
deletion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message