hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron Fabbri (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-14020) Optimize dirListingUnion
Date Thu, 26 Jan 2017 07:27:24 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-14020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15839362#comment-15839362

Aaron Fabbri commented on HADOOP-14020:

Thanks for doing this [~mackrorysd].  Looks really good. Simple logic but huge improvement
in behavior.

Also impressed at your effort writing a test case for this.  I would have been tempted to
confirm behavior with log messages then call it good. 

Minor optimizations suggested below...

+  public boolean put(FileStatus childFileStatus) {
         "childFileStatus must be non-null");
     Path childPath = childStatusToPathKey(childFileStatus);
+    PathMetadata oldValue = listMap.get(childPath);
+    PathMetadata newValue = new PathMetadata(childFileStatus);
+    if (oldValue != null && oldValue.equals(newValue)) {
+      return false;
+    }
     listMap.put(childPath, new PathMetadata(childFileStatus));
How about listMap.put(childPath, newValue) here.  Save a little garbage.

   public static FileStatus[] dirListingUnion(MetadataStore ms, Path path,
-      List<FileStatus> backingStatuses, DirListingMetadata dirMeta)
-      throws IOException {
+        List<FileStatus> backingStatuses, DirListingMetadata dirMeta,
+      Configuration conf) throws IOException {
Looks like you only use conf to check the value of Constants.METADATASTORE_AUTHORITATIVE.

Can we just pass in a boolean isMetadataStoreAuthoritative instead of doing a conf.get() each
time?  More efficient (and explicit).

> Optimize dirListingUnion
> ------------------------
>                 Key: HADOOP-14020
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14020
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>            Reporter: Sean Mackrory
>            Assignee: Sean Mackrory
>         Attachments: HADOOP-14020-HADOOP-13345.001.patch, HADOOP-14020-HADOOP-13345.002.patch
> There's a TODO in dirListingUnion:
> {quote}// TODO optimize for when allowAuthoritative = false{quote}
> There will be cases when we can intelligently avoid a round trip: if S3A results are
a subset or the metadatastore results (including them being equal or empty) then writing back
will do nothing (although perhaps that should set the authoritative flag if it isn't set already).
> There may also be cases where users want to just skip that altogether. It's wasted work
if authoritative mode is disabled, so perhaps we want to trigger a skip if that's false, or
perhaps it should be a separate property. First one makes for simpler config, second is more

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message