spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Takeshi Yamamuro <>
Subject Re: spark 2.0.0 - when saving a model to S3 spark creates temporary files. Why?
Date Thu, 25 Aug 2016 11:56:43 GMT

Seems this just prevents writers from leaving partial data in a destination
dir when jobs fail.
In the previous versions of Spark, there was a way to directly write data
in a destination though,
Spark v2.0+ has no way to do that because of the critial issue on S3 (See:

// maropu

On Thu, Aug 25, 2016 at 2:40 PM, Tal Grynbaum <>

> I read somewhere that its because s3 has to know the size of the file
> upfront
> I dont really understand this,  as to why is it ok  not to know it for the
> temp files and not ok for the final files.
> The delete permission is the minor disadvantage from my side,  the worst
> thing is that i have a cluster of 100 machines sitting idle for 15 minutes
> waiting for copy to end.
> Any suggestions how to avoid that?
> On Thu, Aug 25, 2016, 08:21 Aseem Bansal <> wrote:
>> Hi
>> When Spark saves anything to S3 it creates temporary files. Why? Asking
>> this as this requires the the access credentails to be given
>> delete permissions along with write permissions.

Takeshi Yamamuro

View raw message