spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Takeshi Yamamuro <linguin....@gmail.com>
Subject Re: spark 2.0.0 - when saving a model to S3 spark creates temporary files. Why?
Date Thu, 25 Aug 2016 12:20:13 GMT
afaik no.

// maropu

On Thu, Aug 25, 2016 at 9:16 PM, Tal Grynbaum <tal.grynbaum@gmail.com>
wrote:

> Is/was there an option similar to DirectParquetOutputCommitter to write
> json files to S3 ?
>
> On Thu, Aug 25, 2016 at 2:56 PM, Takeshi Yamamuro <linguin.m.s@gmail.com>
> wrote:
>
>> Hi,
>>
>> Seems this just prevents writers from leaving partial data in a
>> destination dir when jobs fail.
>> In the previous versions of Spark, there was a way to directly write data
>> in a destination though,
>> Spark v2.0+ has no way to do that because of the critial issue on S3
>> (See: SPARK-10063).
>>
>> // maropu
>>
>>
>> On Thu, Aug 25, 2016 at 2:40 PM, Tal Grynbaum <tal.grynbaum@gmail.com>
>> wrote:
>>
>>> I read somewhere that its because s3 has to know the size of the file
>>> upfront
>>> I dont really understand this,  as to why is it ok  not to know it for
>>> the temp files and not ok for the final files.
>>> The delete permission is the minor disadvantage from my side,  the worst
>>> thing is that i have a cluster of 100 machines sitting idle for 15 minutes
>>> waiting for copy to end.
>>>
>>> Any suggestions how to avoid that?
>>>
>>> On Thu, Aug 25, 2016, 08:21 Aseem Bansal <asmbansal2@gmail.com> wrote:
>>>
>>>> Hi
>>>>
>>>> When Spark saves anything to S3 it creates temporary files. Why? Asking
>>>> this as this requires the the access credentails to be given
>>>> delete permissions along with write permissions.
>>>>
>>>
>>
>>
>> --
>> ---
>> Takeshi Yamamuro
>>
>
>
>
> --
> *Tal Grynbaum* / *CTO & co-founder*
>
> m# +972-54-7875797
>
>         mobile retention done right
>



-- 
---
Takeshi Yamamuro

Mime
View raw message