Yea, that seems to be the case. It seems that dynamically resizing a standalone Spark cluster is very simple.Thanks!On Mon, Mar 28, 2016 at 10:22 PM, Mich Talebzadeh <email@example.com> wrote:start-all start the master and anything else in slaves filestart-master.sh starts the master only.I use start-slaves.sh for my purpose with added nodes to slaves file.When you run start-slave.sh <MASTER_IP_ADD> you are creating another worker process on the master host. You can check the status on Spark GUI on <HOST>:8080. Depending the ratio of Memory/core for worker process the additional worker may or may not be used.On 28 March 2016 at 22:58, Sung Hwan Chung <firstname.lastname@example.org> wrote:sbin/start-all.shsbin/stop-slaves.shIt seems that the conf/slaves file is only for consumption by the following scripts:sbin/start-slaves.shsbin/stop-all.shI.e., conf/slaves file doesn't affect a running cluster.
Is this true?On Mon, Mar 28, 2016 at 9:31 PM, Sung Hwan Chung <email@example.com> wrote:No I didn't add it to the conf/slaves file.What I want to do is leverage auto-scale from AWS, without needing to stop all the slaves (e.g. if a lot of slaves are idle, terminate those).
Also, the book-keeping is easier if I don't have to deal with some centralized list of slave list that needs to be modified every time a node is added/removed.On Mon, Mar 28, 2016 at 9:20 PM, Mich Talebzadeh <firstname.lastname@example.org> wrote:Have you added the slave host name to $SPARK_HOME/conf?Then you can use start-slaves.sh or stop-slaves.sh for all instancesThe assumption is that slave boxes have $SPARK_HOME installed in the same directory as $SPARK_HOME is installed in the master.HTHOn 28 March 2016 at 22:06, Sung Hwan Chung <email@example.com> wrote:stop-slave.shandstart-slave.sh (SPARK_MASTER_ADDR)Hello,I found that I could dynamically add/remove new workers to a running standalone Spark cluster by simply triggering:E.g., I could instantiate a new AWS instance and just add it to a running cluster without needing to add it to slaves file and restarting the whole cluster.It seems that there's no need for me to stop a running cluster.
Is this a valid way of dynamically resizing a spark cluster (as of now, I'm not concerned about HDFS)? Or will there be certain unforeseen problems if nodes are added/removed this way?