spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aant00 <aan...@yahoo.com>
Subject Master failover and active jobs
Date Mon, 01 Feb 2016 22:19:55 GMT
Hi - 

I'm running Spark 1.5.2 in standalone mode with multiple masters using
zookeeper for failover.  The master fails over correctly to the standby when
it goes down, and running applications continue to run, but in the new
active master web UI, they are marked as "WAITING", and the workers have
these entries in their logs: 

16/01/30 00:51:13 ERROR Worker: Connection to master failed! Waiting for
master to reconnect... 
16/01/30 00:51:13 WARN Worker: Failed to connect to master XXX:7077 
akka.actor.ActorNotFound: Actor not found for:
ActorSelection[Anchor(akka.tcp://sparkMaster@XXX:7077/), Path(/user/Master)] 

Should they be "RUNNING" still? One time, it looks like the job stopped
functioning (This is a continuously running streaming job), but I haven't
been able to reproduce it.  FWIW, the driver that started it is still marked
as "RUNNING". 

Thanks. 
- Anthony 



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Master-failover-and-active-jobs-tp26128.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message