spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Blue <>
Subject Re: Output Committers for S3
Date Tue, 21 Feb 2017 21:24:38 GMT
Does S3Guard help with this? I thought it was like S3mper and could
help detect eventual consistency problems, but wouldn't help with the
committer problem.


On Tue, Feb 21, 2017 at 12:39 PM, Matthew Schauer
<> wrote:
> Thanks for the repo, Ryan!  I had heard that Netflix had a committer that
> used the local filesystem as a temporary store, but I wasn't able to find
> that anywhere until now.  I implemented something similar that writes to
> HDFS and then copies to S3, but it doesn't use the multipart upload API, so
> I'm sure yours will be faster.  I think this is the best thing until S3Guard
> comes out.
> As far as my UUID-tracking approach goes, I was under the impression that a
> given task would write the same set of files on each attempt.  Thus, if the
> task fails, either the whole job is aborted and the files are removed, or
> the task is retried and the files are overwritten.  On the other and, I can
> see how having partially-written data visible to readers immediately could
> cause problems, and that is a good reason to avoid my approach.
> Steve -- that design document was a very enlightening read.  I will be
> interested in following and possibly contributing to S3Guard in the future.
> --
> View this message in context:
> Sent from the Apache Spark Developers List mailing list archive at
> ---------------------------------------------------------------------
> To unsubscribe e-mail:

Ryan Blue
Software Engineer

To unsubscribe e-mail:

View raw message