What are the restrictions for using more then one spark contexts in a single scala application? I did not see any documented limitations, but we did observe some bad behavior when trying to do this. The one I'm hitting now is that if I create a local context, stop it and then create one to a standalone spark cluster then my application dies with executor errors. The problem seems to be that the executors try to connect to the driver host "localhost" instead of the actual host name of the driver.
I see this in the executor logs:
Spark Executor Command: "java" "-cp" ":/home/xandrew/spark-0.9.1/conf:/home/xandrew/spark-0.9.1/assembly/target/scala-2.10/spark-assembly-0.9.1-hadoop1.0.4.jar" "-Xms1700M" "-Xmx1700M" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "akka.tcp://spark@localhost:59844/user/CoarseGrainedScheduler" "4" "my-precious-3qp31.c.big-graph-gc1.internal" "1" "akka.tcp://sparkWorker@my-precious-3qp31.c.big-graph-gc1.internal:51524/user/Worker" "app-20140526143633-0003"
14/05/26 14:36:42 INFO CoarseGrainedExecutorBackend: Connecting to driver: akka.tcp://spark@localhost:59844/user/CoarseGrainedScheduler
14/05/26 14:36:42 ERROR CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@my-precious-3qp31.c.big-graph-gc1.internal:35392] -> [akka.tcp://spark@localhost:59844] disassociated! Shutting down.
Background: I'm trying to write a system which runs on a single host and does computations by default using a local spark context. But when there is need, it is able to span a real spark cluster (I'm using Google compute engine to start virtual machines dynamically to host it) and switch over to a new context connecting to that cluster.