spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Apache Spark (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-3150) NullPointerException in Spark recovery after simultaneous fall of master and driver
Date Wed, 20 Aug 2014 18:28:27 GMT

    [ https://issues.apache.org/jira/browse/SPARK-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14104288#comment-14104288
] 

Apache Spark commented on SPARK-3150:
-------------------------------------

User 'tanyatik' has created a pull request for this issue:
https://github.com/apache/spark/pull/2062

> NullPointerException in Spark recovery after simultaneous fall of master and driver
> -----------------------------------------------------------------------------------
>
>                 Key: SPARK-3150
>                 URL: https://issues.apache.org/jira/browse/SPARK-3150
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.0.2
>         Environment:  Linux 3.2.0-23-generic x86_64
>            Reporter: Tatiana Borisova
>
> The issue happens when Spark is run standalone on a cluster.
> When master and driver fall simultaneously on one node in a cluster, master tries to
recover its state and restart spark driver.
> While restarting driver, it falls with NPE exception (stacktrace is below).
> After falling, it restarts and tries to recover its state and restart Spark driver again.
It happens over and over in an infinite cycle.
> Namely, Spark tries to read DriverInfo state from zookeeper, but after reading it happens
to be null in DriverInfo.worker.
> Stacktrace (on version 1.0.0, but reproduceable on version 1.0.2, too)
> 2014-08-14 21:44:59,519] ERROR  (akka.actor.OneForOneStrategy)
> java.lang.NullPointerException
>         at org.apache.spark.deploy.master.Master$$anonfun$completeRecovery$5.apply(Master.scala:448)
>         at org.apache.spark.deploy.master.Master$$anonfun$completeRecovery$5.apply(Master.scala:448)
>         at scala.collection.TraversableLike$$anonfun$filter$1.apply(TraversableLike.scala:264)
>         at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
>         at scala.collection.TraversableLike$class.filter(TraversableLike.scala:263)
>         at scala.collection.AbstractTraversable.filter(Traversable.scala:105)
>         at org.apache.spark.deploy.master.Master.completeRecovery(Master.scala:448)
>         at org.apache.spark.deploy.master.Master$$anonfun$receive$1.applyOrElse(Master.scala:376)
>         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>         at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>         at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>         at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>         at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>         at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>         at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>         at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> How to reproduce: kill both master and driver processes on some cluster node when running
Spark standalone on a cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message