> Nan Zhu
> On Monday, June 2, 2014 at 3:03 PM, Patrick Wendell wrote:
> Hey There,
> The issue was that the old behavior could cause users to silently
> overwrite data, which is pretty bad, so to be conservative we decided
> to enforce the same checks that Hadoop does.
> This was documented by this JIRA:
> However, it would be very easy to add an option that allows preserving
> the old behavior. Is anyone here interested in contributing that? I
> created a JIRA for it:
> - Patrick
> On Mon, Jun 2, 2014 at 9:22 AM, Pierre Borckmans
> Indeed, the behavior has changed for good or for bad. I mean, I agree with
> the danger you mention but I'm not sure it's happening like that. Isn't
> there a mechanism for overwrite in Hadoop that automatically removes part
> files, then writes a _temporary folder and then only the part files along
> with the _success folder.
> In any case this change of behavior should be documented IMO.
> Message sent from a mobile device - excuse typos and abbreviations
> Le 2 juin 2014 à 17:42, Nicholas Chammas <firstname.lastname@example.org
> écrit :
> What I've found using saveAsTextFile() against S3 (prior to Spark 1.0.0.) is
> that files get overwritten automatically. This is one danger to this though.
> If I save to a directory that already has 20 part- files, but this time
> around I'm only saving 15 part- files, then there will be 5 leftover part-
> files from the previous set mixed in with the 15 newer files. This is
> potentially dangerous.
> I haven't checked to see if this behavior has changed in 1.0.0. Are you