spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sjayatheertha <sjayatheer...@gmail.com>
Subject Re: Persist RDD doubt
Date Thu, 23 Mar 2017 23:03:08 GMT


Spark’s cache is fault-tolerant – if any partition of an RDD is lost, it will automatically
be recomputed using the transformations that originally created it.




> On Mar 23, 2017, at 4:11 AM, nayan sharma <nayansharma13@gmail.com> wrote:
> 
> In case of task failures,does spark clear the persisted RDD (StorageLevel.MEMORY_ONLY_SER)
and recompute them again when the task is attempted to start from beginning. Or will the cached
RDD be appended.
> 
> How does spark checks whether the RDD has been cached and skips the caching step for
a particular task.
> 
>> On 23-Mar-2017, at 3:36 PM, Artur R <artur@gpnxgroup.com> wrote:
>> 
>> I am not pretty sure, but:
>>  - if RDD persisted in memory then on task fail executor JVM process fails too, so
the memory is released
>>  - if RDD persisted on disk then on task fail Spark shutdown hook just wipes temp
files
>> 
>>> On Thu, Mar 23, 2017 at 10:55 AM, Jörn Franke <jornfranke@gmail.com> wrote:
>>> What do you mean by clear ? What is the use case?
>>> 
>>>> On 23 Mar 2017, at 10:16, nayan sharma <nayansharma13@gmail.com> wrote:
>>>> 
>>>> Does Spark clears the persisted RDD in case if the task fails ?
>>>> 
>>>> Regards,
>>>> 
>>>> Nayan
>> 
> 

Mime
View raw message