hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Mackrory (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-16085) S3Guard: use object version to protect against inconsistent read after replace/overwrite
Date Fri, 01 Feb 2019 21:20:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-16085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16758672#comment-16758672

Sean Mackrory commented on HADOOP-16085:

Thanks for submitting a patch [~ben.roling]. Haven't had a chance to do a full review yet,
but one of [~fabbri]'s comments was also high on my list of things to watch out for:
{quote}Backward / forward compatible with existing S3Guarded buckets and Dynamo tables.{quote}
Specifically, we need to gracefully deal with any row missing an object version. The other
direction is easy - if this simply adds a new field, old code will ignore it and we'll continue
to get the current behavior.

My other concern is that this requires enabling object versioning. I know [~fabbri] has done
some testing with that and I think eventually hit issues. Was it just a matter of the space
all the versions were taking up, or was it actually a performance problem once there was enough

> S3Guard: use object version to protect against inconsistent read after replace/overwrite
> ----------------------------------------------------------------------------------------
>                 Key: HADOOP-16085
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16085
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.2.0
>            Reporter: Ben Roling
>            Priority: Major
>         Attachments: HADOOP-16085_3.2.0_001.patch
> Currently S3Guard doesn't track S3 object versions.  If a file is written in S3A with
S3Guard and then subsequently overwritten, there is no protection against the next reader
seeing the old version of the file instead of the new one.
> It seems like the S3Guard metadata could track the S3 object version.  When a file is
created or updated, the object version could be written to the S3Guard metadata.  When a
file is read, the read out of S3 could be performed by object version, ensuring the correct
version is retrieved.
> I don't have a lot of direct experience with this yet, but this is my impression from
looking through the code.  My organization is looking to shift some datasets stored in HDFS
over to S3 and is concerned about this potential issue as there are some cases in our codebase
that would do an overwrite.
> I imagine this idea may have been considered before but I couldn't quite track down any
JIRAs discussing it.  If there is one, feel free to close this with a reference to it.
> Am I understanding things correctly?  Is this idea feasible?  Any feedback that could
be provided would be appreciated.  We may consider crafting a patch.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message