hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A
Date Mon, 22 May 2017 16:06:05 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16019741#comment-16019741

Steve Loughran commented on HADOOP-13345:

This s a read pipeline. What I think has happened is the client did open(), and s3guard skipped
the existence check as ddb said it was there (and how long it was). The HTTP stream isn't
set up in open(); it relies on the HEAD to have done the check first (a getFileStatus() is
called to verify the path isn't a dir; if the path isn't there it fails. (note we could do
a simpler check without the LIST call in the dir scan).

Because with s3Guard the HEAD request is skipped, it's only on the first seek that an attempt
is made to GET the file contents. No file, error. There's nothing wrong with that per-se,
it just means that if s3guard is inconsistent with the store, things show up later.

1.  could this be reported? e.g when an FNFE is raised when opening  a stream on a s3guarded
bucket, warn use this may be an inconsistency.
2. S3AInputStream relies on the file length being normative {see {{calculateRequestLimit}}).
If DDB thinks there is less data than there is, the extra data isn't picked up. You won't
be able to seek past the amount of data that s3guard thinks is in the file, even if there
is now more

We may want to have s3guard in non-auth mode do the HEAD on the final entry for that failfast
and to get the length. (side topic: if we do that, and note the length is different, what
to do in s3guard itself?). (This could be done in s3a input stream, as it if fadvise=normal
it could start with a full GET of the file & pick up content-length there. Its for the
seek-optimised random IO that we'd want to postpone the GET until the first readFully(), and
limit its length to something shorter

> S3Guard: Improved Consistency for S3A
> -------------------------------------
>                 Key: HADOOP-13345
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13345
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs/s3
>            Reporter: Chris Nauroth
>            Assignee: Chris Nauroth
>         Attachments: HADOOP-13345.prototype1.patch, s3c.001.patch, S3C-ConsistentListingonS3-Design.pdf,
S3GuardImprovedConsistencyforS3A.pdf, S3GuardImprovedConsistencyforS3AV2.pdf
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a stronger
consistency model than what is currently offered.  The solution coordinates with a strongly
consistent external store to resolve inconsistencies caused by the S3 eventual consistency

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message