spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "wuyi (Jira)" <>
Subject [jira] [Commented] (SPARK-32898) totalExecutorRunTimeMs is too big
Date Thu, 17 Sep 2020 03:18:00 GMT


wuyi commented on SPARK-32898:

I think the issue is(for executorRunTimeMs): Before a task reaches to "taskStartTimeNs = System.nanoTime()",
it might be already killed(e.g., by another successful attempt).  So, taskStartTimeNs can
not get initialized and remains 0. However, the executorRunTimeMs is calculated by "System.nanoTime()
- taskStartTimeNs" in collectAccumulatorsAndResetStatusOnFailure, which is obviously a wrong
big result when taskStartTimeNs = 0.


I haven't taken a detail look for the submissionTime, but it sounds like it's a different
issue? Though, it may be due to the same logic hole.


I'd like to make a fix for the executorRunTimeMs first if [~linhongliu-db] doesn't mind.

> totalExecutorRunTimeMs is too big
> ---------------------------------
>                 Key: SPARK-32898
>                 URL:
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 3.0.1
>            Reporter: Linhong Liu
>            Priority: Major
> This might be because of incorrectly calculating executorRunTimeMs in Executor.scala
>  The function collectAccumulatorsAndResetStatusOnFailure(taskStartTimeNs) can be called
when taskStartTimeNs is not set yet (it is 0).
> As of now in master branch, here is the problematic code: 
> []
> There is a throw exception before this line. The catch branch still updates the metric.
>  However the query shows as SUCCESSful. Maybe this task is speculative. Not sure.
> submissionTime in LiveExecutionData may also have similar problem.
> []

This message was sent by Atlassian Jira

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message