hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-16085) S3Guard: use object version to protect against inconsistent read after replace/overwrite
Date Fri, 01 Feb 2019 18:55:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-16085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16758577#comment-16758577
] 

Steve Loughran commented on HADOOP-16085:
-----------------------------------------

the enemy here is eventual consistency. Which is of course the whole reason S3Guard was needed.


What issues are we worrying about

# mixed writer: some not going with s3guard, some doing. Even in nonauth mode, I worry about
delete tombstones.
# failure during large operations and so s3 not being in sync with the store.
# failure during a workflow with one or more GET calls on the second attempt picking up the
old version.

HADOOP-15625 is going to address the changes within an open file through etag comparison,
but without the etag being cached in the S3Guard repo, it's not going to detect inconsistencies
between the version expected and the version read.

Personally, I'm kind of reluctant to rely on S3Guard for being the sole defence against this
problem. 

bq.  a re-run of a pipeline stage should always use a new output directory,

if you use the S3A committers for your work, and the default mode -insert a guid into the
filename- then filenames are always created unique. It becomes impossible to get a RAW inconsistency.
This is essentially where we are going, along with Apache Iceberg (incubating). Rather than
jump through hoop-after-hoop of workarounds for S3s apparent decision to never deliver consistent
views, come up with data structures which only need one point of consistency (you need to
know the unique filename of the latest iceberg file).

Putting that aside, yes, keeping version markers would be good. I like etags because they
are exposed in getFileChecksum(); their flaw is that they can be very large on massive MPUs
(32bytes/block uploaded). 


BTW, if you are worried about how observable is eventual consistency, generally its delayed
listings over actual content. There's a really good paper with experimental data which does
measure how often you can observe RAW inconsistencies http://www.aifb.kit.edu/images/8/8d/Ic2e2014.pdf


> S3Guard: use object version to protect against inconsistent read after replace/overwrite
> ----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-16085
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16085
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.2.0
>            Reporter: Ben Roling
>            Priority: Major
>         Attachments: HADOOP-16085_3.2.0_001.patch
>
>
> Currently S3Guard doesn't track S3 object versions.  If a file is written in S3A with
S3Guard and then subsequently overwritten, there is no protection against the next reader
seeing the old version of the file instead of the new one.
> It seems like the S3Guard metadata could track the S3 object version.  When a file is
created or updated, the object version could be written to the S3Guard metadata.  When a
file is read, the read out of S3 could be performed by object version, ensuring the correct
version is retrieved.
> I don't have a lot of direct experience with this yet, but this is my impression from
looking through the code.  My organization is looking to shift some datasets stored in HDFS
over to S3 and is concerned about this potential issue as there are some cases in our codebase
that would do an overwrite.
> I imagine this idea may have been considered before but I couldn't quite track down any
JIRAs discussing it.  If there is one, feel free to close this with a reference to it.
> Am I understanding things correctly?  Is this idea feasible?  Any feedback that could
be provided would be appreciated.  We may consider crafting a patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message