spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From unorthodox.engine...@gmail.com
Subject Re: creating new ami image for spark ec2 commands
Date Thu, 12 Jun 2014 19:34:04 GMT
Creating AMIs from scratch is a complete pain in the ass. If you have a spare week, sure. I
understand why the team avoids it.

The easiest way is probably to spin up a working instance and then use Amazons "save as new
AMI", but that has some major limitations, especially with software not expecting it. ("There
are two of me now!") Worker nodes might cope better than the master.

But yes, I also would love new AMIs that don't pull down 200 meg every time I spin up. 
("Spin up a cluster in five minutes" HA!) Also, AMIs per region are also good for costs. I've
thought of doing up new ones, (since I have experience) but I have no time and other issues
first. Perhaps once I know Spark better.

At least with spark, we have more control over the scripts exactly because they are "Primitive".
I had a quick look at YARN/Ambari, and it wasn't obvious they were any better with EC2, and
a hundred times the complexity.

I expect most AWS-heavy companies have a full time person just managing AMIs. They are that
annoying. It's what makes Cloudera attractive.

Jeremy Lee   BCompSci (Hons)
The Unorthodox Engineers

> On 6 Jun 2014, at 6:44 am, Matt Work Coarr <mattcoarr.work@gmail.com> wrote:
> 
> How would I go about creating a new AMI image that I can use with the spark ec2 commands?
I can't seem to find any documentation.  I'm looking for a list of steps that I'd need to
perform to make an Amazon Linux image ready to be used by the spark ec2 tools.
> 
> I've been reading through the spark 1.0.0 documentation, looking at the script itself
(spark_ec2.py), and looking at the github project mesos/spark-ec2.
> 
> From what I can tell, the spark_ec2.py script looks up the id of the AMI based on the
region and machine type (hvm or pvm) using static content derived from the github repo mesos/spark-ec2.
> 
> The spark ec2 script loads the AMI id from this base url:
> https://raw.github.com/mesos/spark-ec2/v2/ami-list
> (Which presumably comes from https://github.com/mesos/spark-ec2 )
> 
> For instance, I'm working with us-east-1 and pvm, I'd end up with AMI id:
> ami-5bb18832
> 
> Is there a list of instructions for how this AMI was created?  Assuming I'm starting
with my own Amazon Linux image, what would I need to do to make it usable where I could pass
that AMI id to spark_ec2.py rather than using the default spark-provided AMI?
> 
> Thanks,
> Matt

Mime
View raw message