spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jerry Lam <>
Subject Re: spark-ec2 vs. EMR
Date Tue, 01 Dec 2015 17:21:53 GMT
Simply put:

EMR = Hadoop Ecosystem (Yarn, HDFS, etc) + Spark + EMRFS + Amazon EMR API + Selected Instance
Types + Amazon EC2 Friendly (bootstrapping)
spark-ec2 = HDFS + Yarn (Optional) + Spark (Standalone Default) + Any Instance Type

I use spark-ec2 for prototyping and I have never use it for production.

just my $0.02

> On Dec 1, 2015, at 11:15 AM, Nick Chammas <> wrote:
> Pinging this thread in case anyone has thoughts on the matter they want to share.
> On Sat, Nov 21, 2015 at 11:32 AM Nicholas Chammas <[hidden email] <x-msg://10/user/SendEmail.jtp?type=node&node=25538&i=0>>
> Spark has come bundled with spark-ec2 <>
for many years. At the same time, EMR has been capable of running Spark for a while, and earlier
this year it added "official" support <>.
> If you're looking for a way to provision Spark clusters, there are some clear differences
between these 2 options. I think the biggest one would be that EMR is a "production" solution
backed by a company, whereas spark-ec2 is not really intended for production use (as far as
I know).
> That particular difference in intended use may or may not matter to you, but I'm curious:
> What are some of the other differences between the 2 that do matter to you? If you were
considering these 2 solutions for your use case at one point recently, why did you choose
one over the other?
> I'd be especially interested in hearing about why people might choose spark-ec2 over
EMR, since the latter option seems to have shaped up nicely this year.
> Nick
> View this message in context: Re: spark-ec2 vs. EMR <>
> Sent from the Apache Spark User List mailing list archive <>

View raw message