Hi Ethan,

How are you specifying the master to spark? 

Able to recover from master failover is already handled by the underlying Mesos scheduler, but you have to use zookeeper instead of directly passing in the master uris. 

Tim

On Mon, Jan 12, 2015 at 12:44 PM, Ethan Wolf <ethan.wolf@alum.mit.edu> wrote:
We are running Spark and Spark Streaming on Mesos (with multiple masters for
HA).
At launch, our Spark jobs successfully look up the current Mesos master from
zookeeper and spawn tasks.

However, when the Mesos master changes while the spark job is executing, the
spark driver seems to interact with the old Mesos master, and therefore
fails to launch any new tasks.
We are running long running Spark streaming jobs, so we have temporarily
switched to coarse grained as a work around, but it prevents us from running
in fine grained mode which we would prefer for some job.

Looking at the code for MesosSchedulerBackend, it it has an empty
implementation of the reregistered (and disconnected) methods, which I
believe would be called when the master changes:
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackend.scala#L202

http://mesos.apache.org/documentation/latest/app-framework-development-guide/

Are there any plans to implement master reregistration in the Spark
framework, or does anyone have any suggested workarounds for long running
jobs to deal with the mesos master changing?  (Or is there something we are
doing wrong?)

Thanks



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Framework-handling-of-Mesos-master-change-tp21107.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org