hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [hadoop] steveloughran commented on issue #1601: HADOOP-16635. S3A innerGetFileStatus scans for directories-only still does a HEAD.
Date Thu, 10 Oct 2019 13:11:24 GMT
steveloughran commented on issue #1601: HADOOP-16635. S3A innerGetFileStatus scans for directories-only
still does a HEAD.
URL: https://github.com/apache/hadoop/pull/1601#issuecomment-540571728
 
 
   Sid, thanks for the comments, will review/update the patch
   
   Interesting point about the double list. This code path is how its always been, presumably
descended from the s3n code. LIST is slower, costs more and much more prone to eventual consistency,
which are all good arguments for HEAD first.
   
   I actually plan to tune some of the calls which always seem to get used on directory walks
(listStatus, listFiles, listLocatedStatus) to do the subtree list first, and only go for the
HEAD calls if they don't find any children. This is to reduce the cost of treewalks where
the bias is towards populated directories

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message