spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Takeshi Yamamuro <linguin....@gmail.com>
Subject Re: spark 2.0.0 - when saving a model to S3 spark creates temporary files. Why?
Date Thu, 25 Aug 2016 11:56:43 GMT
Hi,

Seems this just prevents writers from leaving partial data in a destination
dir when jobs fail.
In the previous versions of Spark, there was a way to directly write data
in a destination though,
Spark v2.0+ has no way to do that because of the critial issue on S3 (See:
SPARK-10063).

// maropu


On Thu, Aug 25, 2016 at 2:40 PM, Tal Grynbaum <tal.grynbaum@gmail.com>
wrote:

> I read somewhere that its because s3 has to know the size of the file
> upfront
> I dont really understand this,  as to why is it ok  not to know it for the
> temp files and not ok for the final files.
> The delete permission is the minor disadvantage from my side,  the worst
> thing is that i have a cluster of 100 machines sitting idle for 15 minutes
> waiting for copy to end.
>
> Any suggestions how to avoid that?
>
> On Thu, Aug 25, 2016, 08:21 Aseem Bansal <asmbansal2@gmail.com> wrote:
>
>> Hi
>>
>> When Spark saves anything to S3 it creates temporary files. Why? Asking
>> this as this requires the the access credentails to be given
>> delete permissions along with write permissions.
>>
>


-- 
---
Takeshi Yamamuro

Mime
View raw message