hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Nauroth (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-13448) S3Guard: Define MetadataStore interface.
Date Wed, 10 Aug 2016 22:28:20 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15416136#comment-15416136

Chris Nauroth commented on HADOOP-13448:

bq. Why does your {{DynamoDBConsistentStore#save()}} implementation walk the path to the root
and save all ancestor paths as well?

That's a good observation.  I think this is a weakness of my prototype, not a desirable choice
intended to carry through to the full implementation.

More specifically, I approached my prototype by developing a separate hadoop-s3guard module
with a new {{ConsistentS3AFileSystem}} class defined as a subclass of the existing {{S3AFileSystem}}
class.  The benefit of this approach was that I didn't need to make a lot of code changes
directly in hadoop-aws, so I could develop the prototype isolated from the churn of merge
conflicts on upstream hadoop-aws patches.  (There was a lot of optimization and bug fixing
happening concurrently at the time.)  The drawback of this approach was that it constrained
my implementation.  For {{mkdirs}}, I could only call the superclass and then pass the path
to {{ConsistentStore#save}}, so the consistent store code needed a complete implementation
using solely that path argument.  There was no way for me to preserve the information discovered
in {{S3AFileSystem#innerMkdirs}} about which intermediate directories were missing, as was
done in your prototype.

I came to the conclusion that the subclassing approach wouldn't be ideal for reasons like
this.  We can get better results by hooking into implementation details more deeply, and that
led me to the refactoring proposed on HADOOP-13447.  Between {{S3Store}}, {{AbstractS3AccessPolicy}}
and the {{MetadataStore}} interface, we should feel free to evolve those interfaces however
it best suits requirements.  They are internal interfaces, so they don't need to be constrained
by the Hadoop compatibility guidelines, as long as {{S3AFileSystem}} can translate back to
the public {{FileSystem}} interface at the end.  In the example you gave here, maybe that
means something like {{S3Store#mkdirs}} returning a result object that lists which directories
in the ancestry were not pre-existing.

Another smaller reason my prototype worked that way is that it was also easy to hook a call
to {{ConsistentStore#save}} onto the close of the stream returned by {{FileSystem#create}}.
 Unlike {{mkdirs}}, there is no such walk up the ancestry to check for pre-existing directories
there, so I had to take care of it entirely within my code.  This is really more of a bug
in the existing S3A code though that I was working around.  (See HADOOP-13221.)

> S3Guard: Define MetadataStore interface.
> ----------------------------------------
>                 Key: HADOOP-13448
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13448
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>            Reporter: Chris Nauroth
>            Assignee: Chris Nauroth
> Define the common interface for metadata store operations.  This is the interface that
any metadata back-end must implement in order to integrate with S3Guard.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message