spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sung Hwan Chung <coded...@cs.stanford.edu>
Subject Re: Dynamically adding/removing slaves throuh start-slave.sh and stop-slave.sh
Date Mon, 28 Mar 2016 22:57:44 GMT
You mean that once a job is in a waiting queue, it won't take advantage of
additional workers that happened to be added after the job was put into the
waiting queue?

That would be less than optimal. But it would be OK with us for now as long
as the additional workers will be taken advantage of by future-submitted
jobs.

On Mon, Mar 28, 2016 at 10:40 PM, Mich Talebzadeh <mich.talebzadeh@gmail.com
> wrote:

> The ACID test will come when you start two or more Spark processes
> simultaneously. If you see queuing (i.e. second job waiting for the first
> job to finish in Spark GUI) then you may not have enough resources for Yarn
> to accommodate two jobs despite the additional worker process.
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 28 March 2016 at 23:30, Sung Hwan Chung <codedeft@cs.stanford.edu>
> wrote:
>
>> Yea, that seems to be the case. It seems that dynamically resizing a
>> standalone Spark cluster is very simple.
>>
>> Thanks!
>>
>> On Mon, Mar 28, 2016 at 10:22 PM, Mich Talebzadeh <
>> mich.talebzadeh@gmail.com> wrote:
>>
>>> start-all start the master and anything else in slaves file
>>> start-master.sh starts the master only.
>>>
>>> I use start-slaves.sh for my purpose with added nodes to slaves file.
>>>
>>> When you run start-slave.sh <MASTER_IP_ADD> you are creating another
>>> worker  process on the master host. You can check the status on Spark GUI
>>> on <HOST>:8080. Depending the ratio of Memory/core for worker process the
>>> additional worker may or may not be used.
>>>
>>>
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>>
>>> On 28 March 2016 at 22:58, Sung Hwan Chung <codedeft@cs.stanford.edu>
>>> wrote:
>>>
>>>> It seems that the conf/slaves file is only for consumption by the
>>>> following scripts:
>>>>
>>>> sbin/start-slaves.sh
>>>> sbin/stop-slaves.sh
>>>> sbin/start-all.sh
>>>> sbin/stop-all.sh
>>>>
>>>> I.e., conf/slaves file doesn't affect a running cluster.
>>>>
>>>> Is this true?
>>>>
>>>>
>>>> On Mon, Mar 28, 2016 at 9:31 PM, Sung Hwan Chung <
>>>> codedeft@cs.stanford.edu> wrote:
>>>>
>>>>> No I didn't add it to the conf/slaves file.
>>>>>
>>>>> What I want to do is leverage auto-scale from AWS, without needing to
>>>>> stop all the slaves (e.g. if a lot of slaves are idle, terminate those).
>>>>>
>>>>> Also, the book-keeping is easier if I don't have to deal with some
>>>>> centralized list of slave list that needs to be modified every time a
node
>>>>> is added/removed.
>>>>>
>>>>>
>>>>> On Mon, Mar 28, 2016 at 9:20 PM, Mich Talebzadeh <
>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>
>>>>>> Have you added the slave host name to $SPARK_HOME/conf?
>>>>>>
>>>>>> Then you can use start-slaves.sh or stop-slaves.sh for all instances
>>>>>>
>>>>>> The assumption is that slave boxes have $SPARK_HOME installed in
the
>>>>>> same directory as $SPARK_HOME is installed in the master.
>>>>>>
>>>>>> HTH
>>>>>>
>>>>>>
>>>>>> Dr Mich Talebzadeh
>>>>>>
>>>>>>
>>>>>>
>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>
>>>>>>
>>>>>>
>>>>>> http://talebzadehmich.wordpress.com
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 28 March 2016 at 22:06, Sung Hwan Chung <codedeft@cs.stanford.edu>
>>>>>> wrote:
>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> I found that I could dynamically add/remove new workers to a
running
>>>>>>> standalone Spark cluster by simply triggering:
>>>>>>>
>>>>>>> start-slave.sh (SPARK_MASTER_ADDR)
>>>>>>>
>>>>>>> and
>>>>>>>
>>>>>>> stop-slave.sh
>>>>>>>
>>>>>>> E.g., I could instantiate a new AWS instance and just add it
to a
>>>>>>> running cluster without needing to add it to slaves file and
restarting the
>>>>>>> whole cluster.
>>>>>>> It seems that there's no need for me to stop a running cluster.
>>>>>>>
>>>>>>> Is this a valid way of dynamically resizing a spark cluster (as
of
>>>>>>> now, I'm not concerned about HDFS)? Or will there be certain
unforeseen
>>>>>>> problems if nodes are added/removed this way?
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message