spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Nastetsky <>
Subject Re: restart from last successful stage
Date Wed, 29 Jul 2015 15:28:46 GMT
I meant a restart by the user, as ayan said.

I was thinking of a case where e.g. a Spark conf setting wrong and the job
failed in Stage 1, in my example .. and we want to rerun the job with the
right conf without rerunning Stage 0. Having this "re-start" capability may
cause some chaos if it would have changed how Stage 0 runs, possibly
creating partition incompatibilities or something else.

Also, another option is to just persist the data from Stage 0 (i.e.
sc.saveAs....) and then run a modified version of the job that skips Stage
0, assuming you have a full understanding of the breakdown of stages in
your job.

On Tue, Jul 28, 2015 at 9:28 PM, Tathagata Das <> wrote:

> Okay, may I am confused on the word "would be useful to *restart* from the
> output of stage 0" ... did the OP mean restart by the user or restart
> automatically by the system?
> On Tue, Jul 28, 2015 at 3:43 PM, ayan guha <> wrote:
>> Hi
>> I do not think op asks about attempt failure but stage failure and
>> finally leading to job failure. In that case, rdd info from last run is
>> gone even if from cache, isn't it?
>> Ayan
>> On 29 Jul 2015 07:01, "Tathagata Das" <> wrote:
>>> If you are using the same RDDs in the both the attempts to run the job,
>>> the previous stage outputs generated in the previous job will indeed be
>>> reused.
>>> This applies to core though. For dataframes, depending on what you do,
>>> the physical plan may get generated again leading to new RDDs which may
>>> cause recomputing all the stages. Consider running the job by generating
>>> the RDD from Dataframe and then using that.
>>> Of course, you can use caching in both core and DataFrames, which will
>>> solve all these concerns.
>>> On Tue, Jul 28, 2015 at 1:03 PM, Alex Nastetsky <
>>>> wrote:
>>>> Is it possible to restart the job from the last successful stage
>>>> instead of from the beginning?
>>>> For example, if your job has stages 0, 1 and 2 .. and stage 0 takes a
>>>> long time and is successful, but the job fails on stage 1, it would be
>>>> useful to be able to restart from the output of stage 0 instead of from the
>>>> beginning.
>>>> Note that I am NOT talking about Spark Streaming, just Spark Core (and
>>>> DataFrames), not sure if the case would be different with Streaming.
>>>> Thanks.

View raw message