Very cool! Have you thought about sending this as a pull request? Wed be happy to maintain it inside Spark, though it might be interesting to find a single Python package that can manage clusters across both EC2 and GCE.


On May 5, 2014, at 7:18 AM, Akhil Das <> wrote:

Hi Sparkers,

We have created a quick spark_gce script which can launch a spark cluster in the Google Cloud. I'm sharing it because it might be helpful for someone using the Google Cloud for deployment rather than AWS.

Here's the link to the script

Feel free to use it and suggest any feedback around it.

In short here's what it does:

Just like the spark_ec2 script, this one also reads certain command-line arguments (See the github page for more details) like the cluster name and all, then starts the machines in the google cloud, sets up the network, adds a 500GB empty disk to all machines, generate the ssh keys on master and transfer it to all slaves and install java and downloads and configures Spark/Shark/Hadoop. Also it starts the shark server automatically. Currently the version is 0.9.1 but I'm happy to add/support more versions if anyone is interested.


