spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "wuyi (Jira)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-32898) totalExecutorRunTimeMs is too big
Date Thu, 17 Sep 2020 03:18:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-32898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17197341#comment-17197341
] 

wuyi commented on SPARK-32898:
------------------------------

I think the issue is(for executorRunTimeMs): Before a task reaches to "taskStartTimeNs = System.nanoTime()",
it might be already killed(e.g., by another successful attempt).  So, taskStartTimeNs can
not get initialized and remains 0. However, the executorRunTimeMs is calculated by "System.nanoTime()
- taskStartTimeNs" in collectAccumulatorsAndResetStatusOnFailure, which is obviously a wrong
big result when taskStartTimeNs = 0.

 

I haven't taken a detail look for the submissionTime, but it sounds like it's a different
issue? Though, it may be due to the same logic hole.

 

I'd like to make a fix for the executorRunTimeMs first if [~linhongliu-db] doesn't mind.

> totalExecutorRunTimeMs is too big
> ---------------------------------
>
>                 Key: SPARK-32898
>                 URL: https://issues.apache.org/jira/browse/SPARK-32898
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 3.0.1
>            Reporter: Linhong Liu
>            Priority: Major
>
> This might be because of incorrectly calculating executorRunTimeMs in Executor.scala
>  The function collectAccumulatorsAndResetStatusOnFailure(taskStartTimeNs) can be called
when taskStartTimeNs is not set yet (it is 0).
> As of now in master branch, here is the problematic code: 
> [https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L470]
>  
> There is a throw exception before this line. The catch branch still updates the metric.
>  However the query shows as SUCCESSful. Maybe this task is speculative. Not sure.
>  
> submissionTime in LiveExecutionData may also have similar problem.
> [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListener.scala#L449]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message