spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hong Shen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-4838) StackOverflowError when serialization task
Date Tue, 16 Dec 2014 03:26:14 GMT

    [ https://issues.apache.org/jira/browse/SPARK-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14247676#comment-14247676
] 

Hong Shen commented on SPARK-4838:
----------------------------------

This is the whole stack.
All we can know is it thow from DAGScheduler.submitMissingTasks, when serialize stage.rdd.
{code}
var taskBinary: Broadcast[Array[Byte]] = null
    try {
      // For ShuffleMapTask, serialize and broadcast (rdd, shuffleDep).
      // For ResultTask, serialize and broadcast (rdd, func).
      val taskBinaryBytes: Array[Byte] =
        if (stage.isShuffleMap) {
          closureSerializer.serialize((stage.rdd, stage.shuffleDep.get) : AnyRef).array()
        } else {
          closureSerializer.serialize((stage.rdd, stage.resultOfJob.get.func) : AnyRef).array()
        }
      taskBinary = sc.broadcast(taskBinaryBytes)
    } catch {
      // In the case of a failure during serialization, abort the stage.
      case e: NotSerializableException =>
        abortStage(stage, "Task not serializable: " + e.toString)
        runningStages -= stage
        return
      case NonFatal(e) =>
        abortStage(stage, s"Task serialization failed: $e\n${e.getStackTraceString}")
        runningStages -= stage
        return
    }
{code}


> StackOverflowError when serialization task
> ------------------------------------------
>
>                 Key: SPARK-4838
>                 URL: https://issues.apache.org/jira/browse/SPARK-4838
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core, SQL
>    Affects Versions: 1.1.0
>            Reporter: Hong Shen
>
> When run a sql with more than 2000 partitions, each partition a  HadoopRDD, it will cause
java.lang.StackOverflowError at serialize task.
>  Error message from spark is:Job aborted due to stage failure: Task serialization failed:
java.lang.StackOverflowError
> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
> ......



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message