spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tal Grynbaum <tal.grynb...@gmail.com>
Subject Re: spark 2.0.0 - when saving a model to S3 spark creates temporary files. Why?
Date Thu, 25 Aug 2016 12:16:49 GMT
Is/was there an option similar to DirectParquetOutputCommitter to write
json files to S3 ?

On Thu, Aug 25, 2016 at 2:56 PM, Takeshi Yamamuro <linguin.m.s@gmail.com>
wrote:

> Hi,
>
> Seems this just prevents writers from leaving partial data in a
> destination dir when jobs fail.
> In the previous versions of Spark, there was a way to directly write data
> in a destination though,
> Spark v2.0+ has no way to do that because of the critial issue on S3 (See:
> SPARK-10063).
>
> // maropu
>
>
> On Thu, Aug 25, 2016 at 2:40 PM, Tal Grynbaum <tal.grynbaum@gmail.com>
> wrote:
>
>> I read somewhere that its because s3 has to know the size of the file
>> upfront
>> I dont really understand this,  as to why is it ok  not to know it for
>> the temp files and not ok for the final files.
>> The delete permission is the minor disadvantage from my side,  the worst
>> thing is that i have a cluster of 100 machines sitting idle for 15 minutes
>> waiting for copy to end.
>>
>> Any suggestions how to avoid that?
>>
>> On Thu, Aug 25, 2016, 08:21 Aseem Bansal <asmbansal2@gmail.com> wrote:
>>
>>> Hi
>>>
>>> When Spark saves anything to S3 it creates temporary files. Why? Asking
>>> this as this requires the the access credentails to be given
>>> delete permissions along with write permissions.
>>>
>>
>
>
> --
> ---
> Takeshi Yamamuro
>



-- 
*Tal Grynbaum* / *CTO & co-founder*

m# +972-54-7875797

        mobile retention done right

Mime
View raw message