spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nicholas Chammas (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-5189) Reorganize EC2 scripts so that nodes can be provisioned independent of Spark master
Date Sat, 10 Jan 2015 19:43:34 GMT

    [ https://issues.apache.org/jira/browse/SPARK-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14272660#comment-14272660
] 

Nicholas Chammas commented on SPARK-5189:
-----------------------------------------

cc [~joshrosen] and [~shivaram] - What do y'all think?

> Reorganize EC2 scripts so that nodes can be provisioned independent of Spark master
> -----------------------------------------------------------------------------------
>
>                 Key: SPARK-5189
>                 URL: https://issues.apache.org/jira/browse/SPARK-5189
>             Project: Spark
>          Issue Type: Improvement
>          Components: EC2
>            Reporter: Nicholas Chammas
>
> As of 1.2.0, we launch Spark clusters on EC2 by setting up the master first, then setting
up all the slaves together. This includes broadcasting files from the lonely master to potentially
hundreds of slaves.
> There are 2 main problems with this approach:
> # Broadcasting files from the master to all slaves using [{{copy-dir}}|https://github.com/mesos/spark-ec2/blob/branch-1.3/copy-dir.sh]
(e.g. during [ephemeral-hdfs init|https://github.com/mesos/spark-ec2/blob/3a95101c70e6892a8a48cc54094adaed1458487a/ephemeral-hdfs/init.sh#L36],
or during [Spark setup|https://github.com/mesos/spark-ec2/blob/3a95101c70e6892a8a48cc54094adaed1458487a/spark/setup.sh#L3])
takes a long time. This time increases as the number of slaves increases.
> # It's more complicated to add slaves to an existing cluster (a la [SPARK-2008]), since
slaves are only configured through the master during the setup of the master itself.
> Logically, the operations we want to implement are:
> * Provision a Spark node
> * Join a node to a cluster (including an empty cluster) as either a master or a slave
> * Remove a node from a cluster
> We need our scripts to roughly be organized to match the above operations. The goals
would be:
> # When launching a cluster, enable all cluster nodes to be provisioned in parallel, removing
the master-to-slave file broadcast bottleneck.
> # Facilitate cluster modifications like adding or removing nodes.
> # Enable exploration of infrastructure tools like [Terraform|https://www.terraform.io/]
that might simplify {{spark-ec2}} internals and perhaps even allow us to build [one tool that
launches Spark clusters on several different cloud platforms|https://groups.google.com/forum/#!topic/terraform-tool/eD23GLLkfDw].
> More concretely, the modifications we need to make are:
> * Replace all occurrences of {{copy-dir}} or {{rsync}}-to-slaves with equivalent, slave-side
operations.
> * Repurpose {{setup-slave.sh}} as {{provision-spark-node.sh}} and make sure it fully
creates a node that can be used as either a master or slave.
> * Create a new script, {{join-to-cluster.sh}}, that takes a provisioned node, configures
it as a master or slave, and joins it to a cluster.
> * Move any remaining logic in {{setup.sh}} up to {{spark_ec2.py}} and delete that script.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message