spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nan Zhu <zhunanmcg...@gmail.com>
Subject master attempted to re-register the worker and then took all workers as unregistered
Date Wed, 15 Jan 2014 01:53:20 GMT
Hi, all  

I’m trying to deploy spark in standalone mode, everything goes as usual,  

the webUI is accessible, the master node wrote some logs saying all workers are registered

14/01/15 01:37:30 INFO Slf4jEventHandler: Slf4jEventHandler started  
14/01/15 01:37:31 INFO ActorSystemImpl: RemoteServerStarted@akka://sparkMaster@172.31.36.93:7077
14/01/15 01:37:31 INFO Master: Starting Spark master at spark://172.31.36.93:7077
14/01/15 01:37:31 INFO MasterWebUI: Started Master web UI at http://ip-172-31-36-93.us-west-2.compute.internal:8080
14/01/15 01:37:31 INFO Master: I have been elected leader! New state: ALIVE
14/01/15 01:37:34 INFO ActorSystemImpl: RemoteClientStarted@akka://sparkWorker@ip-172-31-34-61.us-west-2.compute.internal:37914
14/01/15 01:37:34 INFO ActorSystemImpl: RemoteClientStarted@akka://sparkWorker@ip-172-31-40-28.us-west-2.compute.internal:43055
14/01/15 01:37:34 INFO Master: Registering worker ip-172-31-34-61.us-west-2.compute.internal:37914
with 2 cores, 6.3 GB RAM
14/01/15 01:37:34 INFO ActorSystemImpl: RemoteClientStarted@akka://sparkWorker@ip-172-31-45-211.us-west-2.compute.internal:55355
14/01/15 01:37:34 INFO Master: Registering worker ip-172-31-40-28.us-west-2.compute.internal:43055
with 2 cores, 6.3 GB RAM
14/01/15 01:37:34 INFO Master: Registering worker ip-172-31-45-211.us-west-2.compute.internal:55355
with 2 cores, 6.3 GB RAM
14/01/15 01:37:34 INFO ActorSystemImpl: RemoteClientStarted@akka://sparkWorker@ip-172-31-41-251.us-west-2.compute.internal:47709
14/01/15 01:37:34 INFO Master: Registering worker ip-172-31-41-251.us-west-2.compute.internal:47709
with 2 cores, 6.3 GB RAM
14/01/15 01:37:34 INFO ActorSystemImpl: RemoteClientStarted@akka://sparkWorker@ip-172-31-43-78.us-west-2.compute.internal:36257
14/01/15 01:37:34 INFO Master: Registering worker ip-172-31-43-78.us-west-2.compute.internal:36257
with 2 cores, 6.3 GB RAM
14/01/15 01:38:44 INFO ActorSystemImpl: RemoteClientStarted@akka://spark@ip-172-31-37-160.us-west-2.compute.internal:43086




However, when I launched an application, the master firstly “attempted to re-register the
worker” and then said that all heartbeats are from “unregistered” workers. Can anyone
told me what happened here?

14/01/15 01:38:44 INFO Master: Registering app ALS  
14/01/15 01:38:44 INFO Master: Registered app ALS with ID app-20140115013844-0000
14/01/15 01:38:44 INFO Master: Launching executor app-20140115013844-0000/0 on worker worker-20140115013734-ip-172-31-43-78.us-west-2.compute.internal-36257
14/01/15 01:38:44 INFO Master: Launching executor app-20140115013844-0000/1 on worker worker-20140115013734-ip-172-31-40-28.us-west-2.compute.internal-43055
14/01/15 01:38:44 INFO Master: Launching executor app-20140115013844-0000/2 on worker worker-20140115013734-ip-172-31-34-61.us-west-2.compute.internal-37914
14/01/15 01:38:44 INFO Master: Launching executor app-20140115013844-0000/3 on worker worker-20140115013734-ip-172-31-45-211.us-west-2.compute.internal-55355
14/01/15 01:38:44 INFO Master: Launching executor app-20140115013844-0000/4 on worker worker-20140115013734-ip-172-31-41-251.us-west-2.compute.internal-47709
14/01/15 01:38:44 INFO Master: Registering worker ip-172-31-40-28.us-west-2.compute.internal:43055
with 2 cores, 6.3 GB RAM
14/01/15 01:38:44 INFO Master: Attempted to re-register worker at same address: akka://sparkWorker@ip-172-31-40-28.us-west-2.compute.internal:43055
14/01/15 01:38:44 INFO Master: Registering worker ip-172-31-34-61.us-west-2.compute.internal:37914
with 2 cores, 6.3 GB RAM
14/01/15 01:38:44 INFO Master: Attempted to re-register worker at same address: akka://sparkWorker@ip-172-31-34-61.us-west-2.compute.internal:37914
14/01/15 01:38:44 INFO Master: Registering worker ip-172-31-41-251.us-west-2.compute.internal:47709
with 2 cores, 6.3 GB RAM
14/01/15 01:38:44 INFO Master: Attempted to re-register worker at same address: akka://sparkWorker@ip-172-31-41-251.us-west-2.compute.internal:47709
14/01/15 01:38:44 INFO Master: Registering worker ip-172-31-45-211.us-west-2.compute.internal:55355
with 2 cores, 6.3 GB RAM
14/01/15 01:38:44 INFO Master: Attempted to re-register worker at same address: akka://sparkWorker@ip-172-31-45-211.us-west-2.compute.internal:55355
14/01/15 01:38:44 INFO Master: Registering worker ip-172-31-43-78.us-west-2.compute.internal:36257
with 2 cores, 6.3 GB RAM
14/01/15 01:38:44 INFO Master: Attempted to re-register worker at same address: akka://sparkWorker@ip-172-31-43-78.us-west-2.compute.internal:36257
14/01/15 01:38:44 WARN Master: Got heartbeat from unregistered worker worker-20140115013844-ip-172-31-34-61.us-west-2.compute.internal-37914
14/01/15 01:38:44 WARN Master: Got heartbeat from unregistered worker worker-20140115013844-ip-172-31-45-211.us-west-2.compute.internal-55355
14/01/15 01:38:44 WARN Master: Got heartbeat from unregistered worker worker-20140115013844-ip-172-31-40-28.us-west-2.compute.internal-43055
14/01/15 01:38:44 WARN Master: Got heartbeat from unregistered worker worker-20140115013844-ip-172-31-43-78.us-west-2.compute.internal-36257
14/01/15 01:38:44 WARN Master: Got heartbeat from unregistered worker worker-20140115013844-ip-172-31-41-251.us-west-2.compute.internal-47709
14/01/15 01:38:50 WARN Master: Got heartbeat from unregistered worker worker-20140115013844-ip-172-31-45-211.us-west-2.compute.internal-55355




Thank you very much!

Best,

--  
Nan Zhu


Mime
View raw message