spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacek Laskowski <ja...@japila.pl>
Subject Re: Impact of .localCheckpoint() and executor dying
Date Wed, 06 Jan 2021 19:36:59 GMT
Hi,

> impact of an executor dying after a localCheckpoint is taken.

My memory is a bit vague on this, but I'd not be surprised if this
localCheckpoint-ed RDD would be "broken" and any actions would simply throw
an exception like missing partitions or similar. There's no way back.

I wish myself that someone with more skills in this area chimed in...

Pozdrawiam,
Jacek Laskowski
----
https://about.me/JacekLaskowski
"The Internals Of" Online Books <https://books.japila.pl/>
Follow me on https://twitter.com/jaceklaskowski

<https://twitter.com/jaceklaskowski>


On Wed, Jan 6, 2021 at 8:30 PM Brett Larson <brettpatricklarson@gmail.com>
wrote:

> Jacek,
> Thanks for your response, I am still trying to understand the impact of an
> executor dying after a localCheckpoint is taken.
>
> Would the entire spark application fail in this case due to the broken
> lineage? Or would the jobs associated with that executor need to be
> re-computed from scratch?
>
> Thank you!
>
>
> On Wed, Jan 6, 2021 at 1:09 PM Jacek Laskowski <jacek@japila.pl> wrote:
>
>> Hi,
>>
>> > My understanding is that .localCheckpoint() breaks the lineage of the
>> RDD
>>
>> True.
>>
>> > and this requires that the entire RDD to be rebuild instead of being
>> able to recompute lost partitions.
>>
>> In a sense, it's as if you saved the partitions to executors and re-read
>> them back as source data (for this checkpointed RDD).
>>
>> > Does each executor store a copy of the entire RDD?
>>
>> No. An executor has got only the data of the partitions (for the tasks
>> this executor has executed).
>>
>> > Checkpoint over .localCheckpoint.
>>
>> checkpoint is similar to localCheckpoint, but slower and reliable (as
>> it's on a stable HDFS file system not on an ephemeral executor). In either
>> case, the lineage should be the same = cut.
>>
>> Pozdrawiam,
>> Jacek Laskowski
>> ----
>> https://about.me/JacekLaskowski
>> "The Internals Of" Online Books <https://books.japila.pl/>
>> Follow me on https://twitter.com/jaceklaskowski
>>
>> <https://twitter.com/jaceklaskowski>
>>
>>
>> On Wed, Jan 6, 2021 at 6:15 PM brettplarson <brettpatricklarson@gmail.com>
>> wrote:
>>
>>> Hello,
>>> I am wondering what the impact of using .localCheckpoint() and having the
>>> executor die would be?
>>>
>>> My understanding is that .localCheckpoint() breaks the lineage of the RDD
>>> and this requires that the entire RDD to be rebuild instead of being
>>> able to
>>> recompute lost partitions.
>>>
>>> Does each executor store a copy of the entire RDD?
>>>
>>> It's unclear to me the benefit of using Checkpoint over
>>> .localCheckpoint. (I
>>> am aware that this is HDFS backed, but it's unclear the implications of
>>> this)
>>>
>>> Please let me know,
>>> Thank you!
>>>
>>>
>>>
>>>
>>> --
>>> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>>
>>>
>
> --
> *Brett Larson *
> brettpatricklarson@gmail.com / 847321200
>

Mime
View raw message