spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sujith Jay Nair (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-22714) Spark API Not responding when Fatal exception occurred in event loop
Date Wed, 03 Jan 2018 09:42:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-22714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16309366#comment-16309366
] 

Sujith Jay Nair commented on SPARK-22714:
-----------------------------------------

Hi [~todesking], is this reproducible outside of Spark REPL? Trying to understand if this
is specific to Spark shell.

> Spark API Not responding when Fatal exception occurred in event loop
> --------------------------------------------------------------------
>
>                 Key: SPARK-22714
>                 URL: https://issues.apache.org/jira/browse/SPARK-22714
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.2.0
>            Reporter: todesking
>            Priority: Critical
>
> To reproduce, let Spark to throw an OOM Exception in event loop:
> {noformat}
> scala> spark.sparkContext.getConf.get("spark.driver.memory")
> res0: String = 1g
> scala> val a = new Array[Int](4 * 1000 * 1000)
> scala> val ds = spark.createDataset(a)
> scala> ds.rdd.zipWithIndex
> [Stage 0:>                                                          (0 + 0) / 3]Exception
in thread "dispatcher-event-loop-1" java.lang.OutOfMemoryError: Java heap space
> [Stage 0:>                                                          (0 + 0) / 3]
> // Spark is not responding
> {noformat}
> While not responding, Spark waiting for some Promise, but is never done.
> The promise depends some process in event loop thread, but the thread is dead when Fatal
exception is thrown.
> {noformat}
> "main" #1 prio=5 os_prio=31 tid=0x00007ffc9300b000 nid=0x1703 waiting on condition [0x0000700000216000]
>    java.lang.Thread.State: WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
>         - parking to wait for  <0x00000007ad978eb8> (a scala.concurrent.impl.Promise$CompletionLatch)
>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
>         at scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:202)
>         at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218)
>         at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:153)
>         at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:619)
>         at org.apache.spark.SparkContext.runJob(SparkContext.scala:1918)
>         at org.apache.spark.SparkContext.runJob(SparkContext.scala:1931)
>         at org.apache.spark.SparkContext.runJob(SparkContext.scala:1944)
>         at org.apache.spark.rdd.ZippedWithIndexRDD.<init>(ZippedWithIndexRDD.scala:50)
>         at org.apache.spark.rdd.RDD$$anonfun$zipWithIndex$1.apply(RDD.scala:1293)
>         at org.apache.spark.rdd.RDD$$anonfun$zipWithIndex$1.apply(RDD.scala:1293)
>         at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>         at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
>         at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
>         at org.apache.spark.rdd.RDD.zipWithIndex(RDD.scala:1292)
> {noformat}
> I don't know how to fix it properly, but it seems we need to add Fatal error handling
to EventLoop.run() in core/EventLoop.scala



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message