spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nate D'Amico" <>
Subject RE: EC2 clusters ready in launch time + 30 seconds
Date Fri, 11 Jul 2014 00:10:15 GMT
You are partially correct.

It's not terribly complex, but also not easy to accomplish.  Sounds like you want to manage
some partially/fully baked AMI's with the core spark libs and dependencies already on the
image.  Main issues that crop up are:

1) image sprawl, as libs/config/defaults/etc change, images need to be "rebuilt" and references
2) cross region support (not too huge deal now with copy functionality, just more complex
image mgmt.)

If you don’t want to restrict which instance types/sizes one can use, you also have uptick
in image mgmt. complexity with:

3) instance type (need both standard and hvm)

Starting to work through some automation/config stuff for spark stack on EC2 with a project,
will be focusing the work through the apache bigtop effort to start, can then share with spark
community directly as things progress if people are interested


-----Original Message-----
From: Nicholas Chammas [] 
Sent: Thursday, July 10, 2014 3:06 PM
To: dev
Subject: EC2 clusters ready in launch time + 30 seconds

Hi devs!

Right now it takes a non-trivial amount of time to launch EC2 clusters.
Part of this time is spent starting the EC2 instances, which is out of our control. Another
part of this time is spent installing stuff on and configuring the instances. This, we can

I’d like to explore approaches to upgrading spark-ec2 so that launching a cluster of any
size generally takes only 30 seconds on top of the time to launch the base EC2 instances.
Since Amazon can launch instances concurrently, I believe this means we should be able to
launch a fully operational Spark cluster of any size in constant time. Is that correct?

Do we already have an idea of what it would take to get to that point?


View raw message