spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Akhil Das <ak...@sigmoidanalytics.com>
Subject Re: Worker re-spawn and dynamic node joining
Date Fri, 16 May 2014 06:31:05 GMT
Hi Han :)

1. Is there a way to automatically re-spawn spark workers? We've situations
where executor OOM causes worker process to be DEAD and it does not came
back automatically.

=> Yes. You can either add OOM killer
exception<http://backdrift.org/how-to-create-oom-killer-exceptions> on
all of your Spark processes. Or you can have a cronjob which will keep
monitoring your worker processes and if they goes down the cronjob will
bring it back.

  2. How to dynamically add (or remove) some worker machines to (from) the
cluster? We'd like to leverage the auto-scaling group in EC2 for example.

=> You can add/remove worker nodes on the fly by spawning a new machine and
then adding that machine's ip address in the master node then rsyncing the
spark directory with all worker machines including the one you added. Then
simply you can use the *start-all.sh* script inside the master node to
bring up the new worker in action. For removing a worker machine from
master can be done in the same way, you have to remove the workers IP
address from the masters *slaves *file and then you can restart your slaves
and that will get your worker removed.


FYI, we have a deployment tool (a web-based UI) that we use for internal
purposes, it is build on top of the spark-ec2 script (with some changes)
and it has a module for adding/removing worker nodes on the fly. It looks
like the attached screenshot. If you want i can give you some access.

Thanks
Best Regards


On Wed, May 14, 2014 at 9:52 PM, Han JU <ju.han.felix@gmail.com> wrote:

> Hi all,
>
> Just 2 questions:
>
>   1. Is there a way to automatically re-spawn spark workers? We've
> situations where executor OOM causes worker process to be DEAD and it does
> not came back automatically.
>
>   2. How to dynamically add (or remove) some worker machines to (from) the
> cluster? We'd like to leverage the auto-scaling group in EC2 for example.
>
> We're using spark-standalone.
>
> Thanks a lot.
>
> --
> *JU Han*
>
> Data Engineer @ Botify.com
>
> +33 0619608888
>

Mime
View raw message