spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cheney Sun <sun.che...@gmail.com>
Subject Re: master attempted to re-register the worker and then took all workers as unregistered
Date Tue, 08 Jul 2014 14:17:25 GMT
Yes, 0.9.1.


On Tue, Jul 8, 2014 at 10:26 PM, Nan Zhu <zhunanmcgill@gmail.com> wrote:

>  Hi, Cheney,
>
> Thanks for the information
>
> which version are you using, 0.9.1?
>
> Best,
>
> --
> Nan Zhu
>
> On Tuesday, July 8, 2014 at 10:09 AM, Cheney Sun wrote:
>
> Hi Nan,
>
> The problem is still there, just as I described before. It's said that the
> issue had already been addressed in some JIRA and resolved in newer
> version, but I haven't get chance to try it.  If you have any finding,
> please let me know.
>
> Thanks,
> Cheney
>
>
> On Tue, Jul 8, 2014 at 7:16 AM, Nan Zhu <zhunanmcgill@gmail.com> wrote:
>
>  Hey, Cheney,
>
> The problem is still existing?
>
> Sorry for the delay, I’m starting to look at this issue,
>
> Best,
>
> --
> Nan Zhu
>
> On Tuesday, May 6, 2014 at 10:06 PM, Cheney Sun wrote:
>
> Hi Nan,
>
> In worker's log, I see the following exception thrown when try to launch
> on executor. (The SPARK_HOME is wrongly specified on purpose, so there is
> no such file "/usr/local/spark1/bin/compute-classpath.sh").
> After the exception was thrown several times, the worker was requested to
> kill the executor. Following the killing, the worker try to register again
> with master, but master reject the registration with WARN message" Got
> heartbeat from unregistered worker
> worker-20140504140005-host-spark-online001"
>
> Looks like the issue wasn't fixed in 0.9.1. Do you know any pull request
> addressing this issue? Thanks.
>
> java.io.IOException: Cannot run program "/usr/local/spark1/bin/
> compute-classpath.sh" (in directory "."): error=2, No such file or
> directory
>         at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
>         at
> org.apache.spark.util.Utils$.executeAndGetOutput(Utils.scala:600)
>         at
> org.apache.spark.deploy.worker.CommandUtils$.buildJavaOpts(CommandUtils.scala:58)
>         at
> org.apache.spark.deploy.worker.CommandUtils$.buildCommandSeq(CommandUtils.scala:37)
>         at
> org.apache.spark.deploy.worker.ExecutorRunner.getCommandSeq(ExecutorRunner.scala:104)
>         at
> org.apache.spark.deploy.worker.ExecutorRunner.fetchAndRunExecutor(ExecutorRunner.scala:119)
>         at
> org.apache.spark.deploy.worker.ExecutorRunner$$anon$1.run(ExecutorRunner.scala:59)
> Caused by: java.io.IOException: error=2, No such file or directory
>         at java.lang.UNIXProcess.forkAndExec(Native Method)
>         at java.lang.UNIXProcess.<init>(UNIXProcess.java:135)
>         at java.lang.ProcessImpl.start(ProcessImpl.java:130)
>         at java.lang.ProcessBuilder.start(ProcessBuilder.java:1021)
>         ... 6 more
> ......
> 14/05/04 21:35:45 INFO Worker: Asked to kill executor
> app-20140504213545-0034/18
> 14/05/04 21:35:45 INFO Worker: Executor app-20140504213545-0034/18
> finished with state FAILED message class java.io.IOException: Cannot run
> program "/usr/local/spark1/bin/compute-classpath.sh" (in directory "."):
> error=2, No such file or directory
> 14/05/04 21:35:45 ERROR OneForOneStrategy: key not found:
> app-20140504213545-0034/18
> java.util.NoSuchElementException: key not found: app-20140504213545-0034/18
>         at scala.collection.MapLike$class.default(MapLike.scala:228)
>         at scala.collection.AbstractMap.default(Map.scala:58)
>         at scala.collection.mutable.HashMap.apply(HashMap.scala:64)
>         at
> org.apache.spark.deploy.worker.Worker$$anonfun$receive$1.applyOrElse(Worker.scala:232)
>         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>         at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>         at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>         at
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>         at
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>         at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>         at
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>         at
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> 14/05/04 21:35:45 INFO Worker: Starting Spark worker
> host-spark-online001:7078 with 10 cores, 28.0 GB RAM
> 14/05/04 21:35:45 INFO Worker: Spark home: /usr/local/spark-0.9.1-cdh4.2.0
> 14/05/04 21:35:45 INFO WorkerWebUI: Started Worker web UI at
> http://host-spark-online001:8081
> 14/05/04 21:35:45 INFO Worker: Connecting to master
> spark://host-spark-online001:7077...
> 14/05/04 21:35:45 INFO Worker: Successfully registered with master
> spark://host-spark-online001:7077
>
>
>
>
>
>

Mime
View raw message