spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 阎志涛 <tony....@tendcloud.com>
Subject Executor hang
Date Sun, 07 Oct 2018 12:24:12 GMT
Hi, All,
I am running Spark 2.1 on Hadoop 2.7.2 with yarn. While executing spark tasks, some executor
keep running forever without success. From the following screenshot:
[cid:image002.jpg@01D45E7B.B4FFD340]
We can see that executor 4 keep running for 26 minutes and the shuffle read size/records keep
unchanged for 26mins too.  Threaddump for the thread is as following:
[cid:image004.jpg@01D45E7B.B4FFD340]

[cid:image009.jpg@01D45E7B.B4FFD340]

The linux version is: Linux version 4.14.62-70.117.amzn2.x86_64 (mockbuild@ip-10-0-1-79) and
jdk version is Oracle JDK 1.8.0_181. With jstack on the machine, I can see following thread
dump:

"Executor task launch worker for task 3806" #54 daemon prio=5 os_prio=0 tid=0x0000000001230800
nid=0x1fc runnable [0x00007fba0e600000]
   java.lang.Thread.State: RUNNABLE
       at java.lang.StringCoding.encode(StringCoding.java:364)
       at java.lang.String.getBytes(String.java:941)
       at org.apache.spark.unsafe.types.UTF8String.fromString(UTF8String.java:109)
       at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
Source)
       at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
       at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377)
       at org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.execute(FileFormatWriter.scala:243)
       at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:190)
       at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:188)
       at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1341)
       at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:193)
       at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$3.apply(FileFormatWriter.scala:129)
       at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$3.apply(FileFormatWriter.scala:128)
       at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
       at org.apache.spark.scheduler.Task.run(Task.scala:99)
      at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
       at java.lang.Thread.run(Thread.java:748)

I wonder why this happened? Is it related to my environment of a bug of Spark?

Thanks and Regards,
Tony

阎志涛
研发副总裁

M  + 86-139 1181 5695
Wechat   zhitao_yan

北京腾云天下科技有限公司
北京市东直门外大街39号院2号楼608室,100027

TalkingData.com

Mime
View raw message