spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shixiong Zhu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-19764) Executors hang with supposedly running task that are really finished.
Date Wed, 01 Mar 2017 06:42:45 GMT

    [ https://issues.apache.org/jira/browse/SPARK-19764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15889611#comment-15889611
] 

Shixiong Zhu commented on SPARK-19764:
--------------------------------------

These are master and workers. From the master log, you are using pyspark with the client mode.
The driver logs should just output to the console. Could you paste the output of pyspark shell?

> Executors hang with supposedly running task that are really finished.
> ---------------------------------------------------------------------
>
>                 Key: SPARK-19764
>                 URL: https://issues.apache.org/jira/browse/SPARK-19764
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark, Spark Core
>    Affects Versions: 2.0.2
>         Environment: Ubuntu 16.04 LTS
> OpenJDK Runtime Environment (build 1.8.0_121-8u121-b13-0ubuntu1.16.04.2-b13)
> Spark 2.0.2 - Spark Cluster Manager
>            Reporter: Ari Gesher
>         Attachments: driver-log-stderr.log, executor-2.log, netty-6153.jpg, SPARK-19764.tgz
>
>
> We've come across a job that won't finish.  Running on a six-node cluster, each of the
executors end up with 5-7 tasks that are never marked as completed.
> Here's an excerpt from the web UI:
> ||Index  ▴||ID||Attempt||Status||Locality Level||Executor ID / Host||Launch Time||Duration||Scheduler
Delay||Task Deserialization Time||GC Time||Result Serialization Time||Getting Result Time||Peak
Execution Memory||Shuffle Read Size / Records||Errors||
> |105	| 1131	| 0	| SUCCESS	|PROCESS_LOCAL	|4 / 172.31.24.171 |	2017/02/27 22:51:36 |	1.9
min |	9 ms |	4 ms |	0.7 s |	2 ms|	6 ms|	384.1 MB| 	90.3 MB / 572	| |
> |106|	1168|	0|	RUNNING	|ANY|	2 / 172.31.16.112|	2017/02/27 22:53:25|	6.5 h	|0 ms|	0 ms|
1 s	|0 ms|	0 ms|	|384.1 MB	|98.7 MB / 624 | |	
> However, the Executor reports the task as finished: 
> {noformat}
> 17/02/27 22:53:25 INFO Executor: Running task 106.0 in stage 5.0 (TID 1168)
> 17/02/27 22:55:29 INFO Executor: Finished task 106.0 in stage 5.0 (TID 1168). 2633558
bytes result sent via BlockManager)
> {noformat}
> As does the driver log:
> {noformat}
> 17/02/27 22:53:25 INFO Executor: Running task 106.0 in stage 5.0 (TID 1168)
> 17/02/27 22:55:29 INFO Executor: Finished task 106.0 in stage 5.0 (TID 1168). 2633558
bytes result sent via BlockManager)
> {noformat}
> Full log from this executor and the {{stderr}} from {{app-20170227223614-0001/2/stderr}}
attached.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message