hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Nauroth (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HADOOP-13447) S3Guard: Refactor S3AFileSystem to support introduction of separate metadata repository and tests.
Date Wed, 10 Aug 2016 15:58:20 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415482#comment-15415482
] 

Chris Nauroth edited comment on HADOOP-13447 at 8/10/16 3:57 PM:
-----------------------------------------------------------------

I'm attaching patch v001 to demonstrate what I have in mind.  The test code refactoring in
HADOOP-13446 is a pre-requisite for this patch.

There are at least 2 more things I want to do with this patch before it's ready:

# I want to write a true unit test that mocks S3 client interactions, to prove that the patch
does in fact set us up to be able to mock the S3 calls effectively (and therefore simulate
eventual consistency).
# I have introduced a test failure in {{ITestS3AFileContextStatistics#testStatistics}}.  Root
cause is that handling of {{FileSystem.Statistics}} through {{DelegateToFileSystem}} is a
bit funky in terms of scope/lifetime of that stats instance.  I haven't found the best fix
yet though.  All other existing tests are passing.

Here is a summary of changes broken down by significant classes:
* {{S3AFileSystem}}: This is now a much smaller class.  It will be responsible for initializing
an {{S3Store}}, which encapsulates the S3 calls, and a concrete subclass of {{AbstractS3AccessPolicy}},
which will control how client calls coordinate with S3 and optionally other external metadata
repositories.
* {{S3ClientFactory}}: This is a factory for construction of the S3 client instance.  Note
that its return type is defined as {{AmazonS3}} (an interface from the AWS SDK), not {{AmazonS3Client}}
(the concrete implementation that issues HTTP calls to the S3 back-end).  This is the indirection
that will allow us to mock the S3 calls.  Tests will be able to configure a different factory
to return a mock client.  The default implementation is {{DefaultS3ClientFactory}}, and all
pre-existing configuration logic related to the S3 client has moved here.
* {{S3Store}}: Much of the existing code of {{S3AFileSystem}} has moved here.  This class
encapsulates how client calls translate to S3 calls.  This layer uses {{Configuration}} to
lookup the desired {{S3ClientFactory}} implementation.
* {{AbstractS3AccessPolicy}} / {{DirectS3AccessPolicy}}: Policy classes define how client
calls coordinate between S3 calls (the {{S3Store}}) and optionally other external metadata
repositories.  Currently, the only concrete implementation just delegates directly to S3,
which provides the same semantics as the existing S3A codebase.  The scope of the various
"implement access policy" sub-tasks is to implement other sub-classes that provide different
semantics: caching, cross-validation for strong consistency, etc.


was (Author: cnauroth):
I'm attach patch v001 to demonstrate what I have in mind.  The test code refactoring in HADOOP-13446
is a pre-requisite for this patch.

There are at least 2 more things I want to do with this patch before it's ready:

# I want to write a true unit test that mocks S3 client interactions, to prove that the patch
does in fact set us up to be able to mock the S3 calls effectively (and therefore simulate
eventual consistency).
# I have introduced a test failure in {{ITestS3AFileContextStatistics#testStatistics}}.  Root
cause is that handling of {{FileSystem.Statistics}} through {{DelegateToFileSystem}} is a
bit funky in terms of scope/lifetime of that stats instance.  I haven't found the best fix
yet though.  All other existing tests are passing.

Here is a summary of changes broken down by significant classes:
* {{S3AFileSystem}}: This is now a much smaller class.  It will be responsible for initializing
an {{S3Store}}, which encapsulates the S3 calls, and a concrete subclass of {{AbstractS3AccessPolicy}},
which will control how client calls coordinate with S3 and optionally other external metadata
repositories.
* {{S3ClientFactory}}: This is a factory for construction of the S3 client instance.  Note
that its return type is defined as {{AmazonS3}} (an interface from the AWS SDK), not {{AmazonS3Client}}
(the concrete implementation that issues HTTP calls to the S3 back-end).  This is the indirection
that will allow us to mock the S3 calls.  Tests will be able to configure a different factory
to return a mock client.  The default implementation is {{DefaultS3ClientFactory}}, and all
pre-existing configuration logic related to the S3 client has moved here.
* {{S3Store}}: Much of the existing code of {{S3AFileSystem}} has moved here.  This class
encapsulates how client calls translate to S3 calls.  This layer uses {{Configuration}} to
lookup the desired {{S3ClientFactory}} implementation.
* {{AbstractS3AccessPolicy}} / {{DirectS3AccessPolicy}}: Policy classes define how client
calls coordinate between S3 calls (the {{S3Store}}) and optionally other external metadata
repositories.  Currently, the only concrete implementation just delegates directly to S3,
which provides the same semantics as the existing S3A codebase.  The scope of the various
"implement access policy" sub-tasks is to implement other sub-classes that provide different
semantics: caching, cross-validation for strong consistency, etc.

> S3Guard: Refactor S3AFileSystem to support introduction of separate metadata repository
and tests.
> --------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-13447
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13447
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>            Reporter: Chris Nauroth
>            Assignee: Chris Nauroth
>         Attachments: HADOOP-13447-HADOOP-13446.001.patch
>
>
> The scope of this issue is to refactor the existing {{S3AFileSystem}} into multiple coordinating
classes.  The goal of this refactoring is to separate the {{FileSystem}} API binding from
the AWS SDK integration, make code maintenance easier while we're making changes for S3Guard,
and make it easier to mock some implementation details so that tests can simulate eventual
consistency behavior in a deterministic way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message