The ACID test will come when you start two or more Spark processes simultaneously. If you see queuing (i.e. second job waiting for the first job to finish in Spark GUI) then you may not have enough resources for Yarn to accommodate two jobs despite the additional worker process.

On 28 March 2016 at 23:30, Sung Hwan Chung <> wrote:
Yea, that seems to be the case. It seems that dynamically resizing a standalone Spark cluster is very simple.


On Mon, Mar 28, 2016 at 10:22 PM, Mich Talebzadeh <> wrote:
start-all start the master and anything else in slaves file starts the master only.

I use for my purpose with added nodes to slaves file.

When you run <MASTER_IP_ADD> you are creating another worker  process on the master host. You can check the status on Spark GUI on <HOST>:8080. Depending the ratio of Memory/core for worker process the additional worker may or may not be used.

On 28 March 2016 at 22:58, Sung Hwan Chung <> wrote:
It seems that the conf/slaves file is only for consumption by the following scripts:


I.e., conf/slaves file doesn't affect a running cluster.

Is this true?

On Mon, Mar 28, 2016 at 9:31 PM, Sung Hwan Chung <> wrote:
No I didn't add it to the conf/slaves file.

What I want to do is leverage auto-scale from AWS, without needing to stop all the slaves (e.g. if a lot of slaves are idle, terminate those).

Also, the book-keeping is easier if I don't have to deal with some centralized list of slave list that needs to be modified every time a node is added/removed.

On Mon, Mar 28, 2016 at 9:20 PM, Mich Talebzadeh <> wrote:
Have you added the slave host name to $SPARK_HOME/conf?

Then you can use or for all instances

The assumption is that slave boxes have $SPARK_HOME installed in the same directory as $SPARK_HOME is installed in the master.


On 28 March 2016 at 22:06, Sung Hwan Chung <> wrote:

I found that I could dynamically add/remove new workers to a running standalone Spark cluster by simply triggering: (SPARK_MASTER_ADDR)


E.g., I could instantiate a new AWS instance and just add it to a running cluster without needing to add it to slaves file and restarting the whole cluster.
It seems that there's no need for me to stop a running cluster.

Is this a valid way of dynamically resizing a spark cluster (as of now, I'm not concerned about HDFS)? Or will there be certain unforeseen problems if nodes are added/removed this way?