hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron Fabbri (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-13760) S3Guard: add delete tracking
Date Wed, 10 May 2017 06:38:04 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16004155#comment-16004155
] 

Aaron Fabbri commented on HADOOP-13760:
---------------------------------------

I did not finish a full review yet but I have a couple of comments to start...

{code}
   /** Map of key to delay -> time it was created. */
-  private Map<String, Long> delayedKeys = new HashMap<>();
+  private Map<String, Long> delayedPutKeys = new HashMap<>();
+
+  /** Map of key to delay -> time it was deleted. */
+  private Map<String, Long> delayedDeleteKeys = new HashMap<>();
+
+  /** Map of delayed deleted key -> object summary. */
+  private Map<String, S3ObjectSummary> delayedDeleteSummaries = new HashMap<>();
+
+  /** Set of delayed prefixes (may include directories not in
+   * delayedDeletedSummaries */
+  private Set<String> delayedDeletePrefixes = new HashSet<>();
+
{code}

Curious if we can combine three deleted item maps to a single map holding
a struct instead...

{code}
+  private ObjectListing restoreListObjects(ListObjectsRequest request,
+                                           ObjectListing rawListing) {
+    List<S3ObjectSummary> outputList = rawListing.getObjectSummaries();
+    for (String key : new HashSet<>(delayedDeleteSummaries.keySet())) {
{code}

Why create a temporary HashSet just to iterate over keys?
{code}
+      if (isKeyDelayed(delayedDeleteKeys, key)) {
+        outputList.add(delayedDeleteSummaries.get(key));
{code}
(1) can this add duplicates?  (2) How do you know this delayed item (e.g.
/a/b/c) belongs in this listing request (e.g. list /x/y/x)?

{code}
+    if (createOnS3) {
       String key = pathToKey(f);
       createFakeDirectory(key);
-      S3Guard.makeDirsOrdered(metadataStore, metadataStoreDirs, username);
-      deleteUnnecessaryFakeDirectories(f.getParent());
-      return true;
     }
+    S3Guard.makeDirsOrdered(metadataStore, metadataStoreDirs, username,
+        getConf().getBoolean(
+            Constants.METADATASTORE_AUTHORITATIVE,
+            Constants.DEFAULT_METADATASTORE_AUTHORITATIVE));;
{code}
I don't think we want this new argument added to {{makeDirsOrdered}}.   makeDirsOrdered
only operates on newly created directories. This means, by definition, we
know the full contents of all listed directories, since they were just created.

This is a common point of confusion.  I should have named the two types of
"authoritative" separately:

1. DirListingMetadata.isAuthoritative(): True iff it contains the full listing
of the directory.  (Maybe I should rename this isCompleteListing())

2. fs.s3a.metadatastore.authoriatative:  S3A behavior:  When true, S3A is allowed
to skip round trips to S3 if the MetadataStore provides a full listing.

#2 generally should not need to be plumbed outside of S3AFileSystem.

{code}
-  private S3AFileStatus s3GetFileStatus(final Path path, String key)
-      throws IOException {
+  private S3AFileStatus s3GetFileStatus(final Path path, String key,
+      Set<Path> tombstones) throws IOException {
{code}

My first thought is: can we do this as a wrapper function?  This is supposed to be s3-level
logic but we're adding metadatastore tombstones to it.  Let me think about this a bit.

{code}
+        boolean isEmpty = true;
+        if (tombstones != null) {
+          for (String prefix : prefixes) {
+            Path absolute = qualify(new Path("/" + prefix));
+            if (!tombstones.contains(absolute)) {
+              isEmpty = false;
+              break;
+            }
+          }
+          if (isEmpty) {
+            for (S3ObjectSummary summary : summaries) {
+              Path absolute = qualify(new Path("/" + summary.getKey()));
+              if (!tombstones.contains(absolute)) {
+                isEmpty = false;
+                break;
+              }
+            }
+          }
+        }
+        return new S3AFileStatus(Tristate.fromBool(isEmpty), path, username);
{code}
Related: this would also be easier to unit test if separated into its own function.




> S3Guard: add delete tracking
> ----------------------------
>
>                 Key: HADOOP-13760
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13760
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>            Reporter: Aaron Fabbri
>            Assignee: Sean Mackrory
>         Attachments: HADOOP-13760-HADOOP-13345.001.patch, HADOOP-13760-HADOOP-13345.002.patch,
HADOOP-13760-HADOOP-13345.003.patch, HADOOP-13760-HADOOP-13345.004.patch, HADOOP-13760-HADOOP-13345.005.patch,
HADOOP-13760-HADOOP-13345.006.patch
>
>
> Following the S3AFileSystem integration patch in HADOOP-13651, we need to add delete
tracking.
> Current behavior on delete is to remove the metadata from the MetadataStore.  To make
deletes consistent, we need to add a {{isDeleted}} flag to {{PathMetadata}} and check it when
returning results from functions like {{getFileStatus()}} and {{listStatus()}}.  In HADOOP-13651,
I added TODO comments in most of the places these new conditions are needed.  The work does
not look too bad.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message