spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Blue <rb...@netflix.com.INVALID>
Subject Re: why there is inconsistency data between total job page and single job page in spark history server
Date Fri, 12 Apr 2019 00:08:04 GMT
We commonly see this in 2 situations:

First, if your driver died, then the history file will be incomplete. When
the history server is missing end events for stages, it assumes they are
still running and displays the duration from the start time until now.

The second possibility is that the job did take a long time but the time
wasn't spent running stages. This could happen if you have a large
broadcast variable that takes a long time to build, and it is created
lazily so it may be built in the middle of a job.

On Thu, Apr 11, 2019 at 3:15 PM zhangliyun <kellyzly@126.com> wrote:

> Hi
>
>    i want to ask a question about spark history server about my job. I
> found that it shows it used 1.4 h for a failed job
>
> but  when i click to the detailed job page , when i click to the duration
> of completed and failed stages ( 2.0min+8s+29s+7s+3.5min) , I guess the
> duration of this failed job is less than 1.4h .
> Is there any thing i understand wrong?  what i guess is the big part of
> time is spend on others not computing, so my question is where I can know
> why the big part of time is needed?
>
> Best Regards
> ZhangLiyun/Kelly Zhang
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org



-- 
Ryan Blue
Software Engineer
Netflix

Mime
View raw message