spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Wendell <>
Subject EC2 Script Changes
Date Sat, 24 Aug 2013 22:23:58 GMT
Hi Everyone,

Today I merged a few improvements to the Spark EC2 scripts to master.
I wanted to take a moment to explain what they are give some more
color on the purpose of these scripts and how we plan to maintain them
going forward. First, the new changes:

- Clusters can be created in any region
- We now support the beefier HVM instance types
- A specific version or git-tag of Spark can be selected when
launching a cluster
- Clusters can now be launched with newer versions of HDFS
- Mesos has been fully replaced with the Standalone scheduler
- There was substantial internal refactoring and clean-up

The purpose of these scripts is to make it extremely easy to create
ephemeral Spark clusters on EC2. In the past this has served two
audiences: (i) new users who want to experiment with Spark on a real
cluster and (ii) developers and researchers testing extensions to

Because these are the main goals, we’ve focused on
ease-of-provisioning and ensuring the cluster environment is as simple
and predictable as possible. This is in part why we’ve moved from
Mesos to the Standalone scheduler (as many know Spark can run on
Mesos, YARN, and it’s own simplified scheduler). We also tightly
control the OS, JVM version, installed packages, etc, so we can
support people easily on the mailing list who use this to kick the
tires with Spark.

If you are running older versions of the ec2 scripts, including those
in Spark 0.6/0.7, things will work just as they used to. This only
affects 0.8.0 and newer. I also wanted to note that we may extend
these scripts over time in ways that break *internal* compatibility
with earlier versions. If you are building applications on top of our
ec2 scripts, you should fork the `spark-ec2` repository and maintain
your own copy of the repo (not sure if anyone’s doing this though…).

Please feel free to test this out in the next few days and report any
issues to me/the dev list. Hopefully this change will make it easier
for people to get started with Spark even if they have AWS data in
other regions.

- Patrick

View raw message