spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mayur Rustagi <mayur.rust...@gmail.com>
Subject Re: Master not seeing recovered nodes("Got heartbeat from unregistered worker ....")
Date Fri, 13 Jun 2014 14:44:49 GMT
I have also had trouble in worker joining the working set. I have typically
moved to Mesos based setup. Frankly for high availability you are better
off using a cluster manager.

Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi <https://twitter.com/mayur_rustagi>



On Fri, Jun 13, 2014 at 8:57 AM, Yana Kadiyska <yana.kadiyska@gmail.com>
wrote:

> Hi, I see this has been asked before but has not gotten any satisfactory
> answer so I'll try again:
>
> (here is the original thread I found:
> http://mail-archives.apache.org/mod_mbox/spark-user/201403.mbox/%3C1394044078706-2312.post@n3.nabble.com%3E
> )
>
> I have a set of workers dying and coming back again. The master prints the
> following warning:
>
> "Got heartbeat from unregistered worker ...."
>
> What is the solution to this -- rolling the master is very undesirable to
> me as I have a Shark context sitting on top of it (it's meant to be highly
> available).
>
> Insights appreciated -- I don't think an executor going down is very
> unexpected but it does seem odd that it won't be able to rejoin the working
> set.
>
> I'm running Spark 0.9.1 on CDH
>
>
>

Mime
View raw message