spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tatiana Borisova (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SPARK-3150) NullPointerException in Spark recovery after simultaneous fall of master and driver
Date Wed, 20 Aug 2014 18:08:27 GMT
Tatiana Borisova created SPARK-3150:
---------------------------------------

             Summary: NullPointerException in Spark recovery after simultaneous fall of master
and driver
                 Key: SPARK-3150
                 URL: https://issues.apache.org/jira/browse/SPARK-3150
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 1.0.2
         Environment:  Linux 3.2.0-23-generic x86_64
            Reporter: Tatiana Borisova


The issue happens when Spark is run standalone on a cluster.

When master and driver fall simultaneously on one node in a cluster, master tries to recover
its state and restart spark driver.
While restarting driver, it falls with NPE exception (stacktrace is below).
After falling, it restarts and tries to recover its state and restart Spark driver again.
It happens over and over in an infinite cycle.

Namely, Spark tries to read DriverInfo state from zookeeper, but after reading it happens
to be null in DriverInfo.worker.

Stacktrace (on version 1.0.0, but reproduceable on version 1.0.2, too)

2014-08-14 21:44:59,519] ERROR  (akka.actor.OneForOneStrategy)
java.lang.NullPointerException
        at org.apache.spark.deploy.master.Master$$anonfun$completeRecovery$5.apply(Master.scala:448)
        at org.apache.spark.deploy.master.Master$$anonfun$completeRecovery$5.apply(Master.scala:448)
        at scala.collection.TraversableLike$$anonfun$filter$1.apply(TraversableLike.scala:264)
        at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
        at scala.collection.TraversableLike$class.filter(TraversableLike.scala:263)
        at scala.collection.AbstractTraversable.filter(Traversable.scala:105)
        at org.apache.spark.deploy.master.Master.completeRecovery(Master.scala:448)
        at org.apache.spark.deploy.master.Master$$anonfun$receive$1.applyOrElse(Master.scala:376)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
        at akka.actor.ActorCell.invoke(ActorCell.scala:456)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
        at akka.dispatch.Mailbox.run(Mailbox.scala:219)
        at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

How to reproduce: kill both master and driver processes on some cluster node when running
Spark standalone on a cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message