spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gino Bustelo <g...@bustelos.com>
Subject Re: Master not seeing recovered nodes("Got heartbeat from unregistered worker ....")
Date Fri, 13 Jun 2014 20:58:59 GMT
I get the same problem, but I'm running in a dev environment based on
docker scripts. The additional issue is that the worker processes do not
die and so the docker container does not exit. So I end up with worker
containers that are not participating in the cluster.


On Fri, Jun 13, 2014 at 9:44 AM, Mayur Rustagi <mayur.rustagi@gmail.com>
wrote:

> I have also had trouble in worker joining the working set. I have
> typically moved to Mesos based setup. Frankly for high availability you are
> better off using a cluster manager.
>
> Mayur Rustagi
> Ph: +1 (760) 203 3257
> http://www.sigmoidanalytics.com
> @mayur_rustagi <https://twitter.com/mayur_rustagi>
>
>
>
> On Fri, Jun 13, 2014 at 8:57 AM, Yana Kadiyska <yana.kadiyska@gmail.com>
> wrote:
>
>> Hi, I see this has been asked before but has not gotten any satisfactory
>> answer so I'll try again:
>>
>> (here is the original thread I found:
>> http://mail-archives.apache.org/mod_mbox/spark-user/201403.mbox/%3C1394044078706-2312.post@n3.nabble.com%3E
>> )
>>
>> I have a set of workers dying and coming back again. The master prints
>> the following warning:
>>
>> "Got heartbeat from unregistered worker ...."
>>
>> What is the solution to this -- rolling the master is very undesirable to
>> me as I have a Shark context sitting on top of it (it's meant to be highly
>> available).
>>
>> Insights appreciated -- I don't think an executor going down is very
>> unexpected but it does seem odd that it won't be able to rejoin the working
>> set.
>>
>> I'm running Spark 0.9.1 on CDH
>>
>>
>>
>

Mime
View raw message