hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron Fabbri (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-13650) S3Guard: Provide command line tools to manipulate metadata store.
Date Thu, 12 Jan 2017 03:00:26 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15819990#comment-15819990

Aaron Fabbri commented on HADOOP-13650:

Also wanted to comment on the addition of fsck features.  IMHO we should do it as a separate
JIRA.  We have diff, import, and destroy, which together provide basic tools for diagnosis
and repair.  I think we should also have a "fsck check" command that simply returns failure
code if any invariants are violated.  In particular, it should fail if a MetadataStore directory
is marked as authoritative, and its contents differ from that of S3.  That violates the "this
is the full directory contents" invariant of the DirListingMetadata#isAuthoritative flag.
 Of course, DynamoDB MS does not currently persist the isAuthoritative flag on listings, so
this would always pass.  When we add that feature (which will be needed for performance improvements),
this will be a good tool to see if things have diverged (e.g. due to client crashing or concurrent
modifications to overlapping subtrees).

Along those lines, a "fsck fix" command could, for any directory where that invariant was
failing, reload the contents of that directory from S3.  Eventual list consistency could cause
false positives here, which the "fsck fix" would persist, so that is a concern.

Note the "fsck check" command could also return failure when a path exists in the MetadataStore
but not in S3.  Again this is subject to eventual list consistency and that would need to
be documented.  It could have a configurable time period after which we assume list consistency
would not be an issue (e.g. if a two-day old file exists in MetadataStore but not S3, it is
likely to *not* be due to eventual consistency).

> S3Guard: Provide command line tools to manipulate metadata store.
> -----------------------------------------------------------------
>                 Key: HADOOP-13650
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13650
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: HADOOP-13345
>            Reporter: Lei (Eddy) Xu
>            Assignee: Lei (Eddy) Xu
>         Attachments: HADOOP-13650-HADOOP-13345.000.patch, HADOOP-13650-HADOOP-13345.001.patch,
HADOOP-13650-HADOOP-13345.002.patch, HADOOP-13650-HADOOP-13345.003.patch, HADOOP-13650-HADOOP-13345.004.patch,
HADOOP-13650-HADOOP-13345.005.patch, HADOOP-13650-HADOOP-13345.006.patch, HADOOP-13650-HADOOP-13345.007.patch,
> Similar systems like EMRFS has the CLI tools to manipulate the metadata store, i.e.,
create or delete metadata store, or {{import}}, {{sync}} the file metadata between metadata
store and S3. 
> http://docs.aws.amazon.com//ElasticMapReduce/latest/ReleaseGuide/emrfs-cli-reference.html
> S3Guard should offer similar functionality. 

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message