spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dana Powers <dana.pow...@gmail.com>
Subject Re: spark-ec2 vs. EMR
Date Wed, 02 Dec 2015 17:44:05 GMT
EMR was a pain to configure on a private VPC last I tried. Has anyone had
success with that? I found spark-ec2 easier to use w private networking,
but also agree that I would use for prod.

-Dana
On Dec 1, 2015 12:29 PM, "Alexander Pivovarov" <apivovarov@gmail.com> wrote:

> 1. Emr 4.2.0 has Zeppelin as an alternative to DataBricks Notebooks
>
> 2. Emr has Ganglia 3.6.0
>
> 3. Emr has hadoop fs settings to make s3 work fast (direct.EmrFileSystem)
>
> 4. EMR has s3 keys in hadoop configs
>
> 5. EMR allows to resize cluster on fly.
>
> 6. EMR has aws sdk in spark classpath. Helps to reduce app assembly jar
> size
>
> 7. ec2 script installs all in /root, EMR has dedicated users: hadoop,
> zeppelin, etc. EMR is similar to Cloudera or Hortonworks
>
> 8. There are at least 3 spark-ec2 projects. (in apache/spark, in mesos, in
> amplab). Master branch in spark has outdated ec2 script. Other projects
> have broken links in readme. WHAT A MESS!
>
> 9. ec2 script has bad documentation and non informative error messages.
> e.g. readme does not say anything about --private-ips option. If you did
> not add the flag it will connect to empty string host (localhost) instead
> of master. Fixed only last week. Not sure if fixed in all branches
>
> 10. I think Amazon will include spark-jobserver to EMR soon.
>
> 11. You do not need to be aws expert to start EMR cluster. Users can use
> EMR web ui to start cluster to run some jobs or work in Zeppelun during the
> day
>
> 12. EMR cluster starts in abour 8 min. Ec2 script works longer and you
> need to be online.
> On Dec 1, 2015 9:22 AM, "Jerry Lam" <chilinglam@gmail.com> wrote:
>
>> Simply put:
>>
>> EMR = Hadoop Ecosystem (Yarn, HDFS, etc) + Spark + EMRFS + Amazon EMR API
>> + Selected Instance Types + Amazon EC2 Friendly (bootstrapping)
>> spark-ec2 = HDFS + Yarn (Optional) + Spark (Standalone Default) + Any
>> Instance Type
>>
>> I use spark-ec2 for prototyping and I have never use it for production.
>>
>> just my $0.02
>>
>>
>>
>> On Dec 1, 2015, at 11:15 AM, Nick Chammas <nicholas.chammas@gmail.com>
>> wrote:
>>
>> Pinging this thread in case anyone has thoughts on the matter they want
>> to share.
>>
>> On Sat, Nov 21, 2015 at 11:32 AM Nicholas Chammas <[hidden email]> wrote:
>>
>>> Spark has come bundled with spark-ec2
>>> <http://spark.apache.org/docs/latest/ec2-scripts.html> for many years.
>>> At the same time, EMR has been capable of running Spark for a while, and
>>> earlier this year it added "official" support
>>> <https://aws.amazon.com/blogs/aws/new-apache-spark-on-amazon-emr/>.
>>>
>>> If you're looking for a way to provision Spark clusters, there are some
>>> clear differences between these 2 options. I think the biggest one would be
>>> that EMR is a "production" solution backed by a company, whereas spark-ec2
>>> is not really intended for production use (as far as I know).
>>>
>>> That particular difference in intended use may or may not matter to you,
>>> but I'm curious:
>>>
>>> What are some of the other differences between the 2 that do matter to
>>> you? If you were considering these 2 solutions for your use case at one
>>> point recently, why did you choose one over the other?
>>>
>>> I'd be especially interested in hearing about why people might choose
>>> spark-ec2 over EMR, since the latter option seems to have shaped up nicely
>>> this year.
>>>
>>> Nick
>>>
>>>
>> ------------------------------
>> View this message in context: Re: spark-ec2 vs. EMR
>> <http://apache-spark-user-list.1001560.n3.nabble.com/Re-spark-ec2-vs-EMR-tp25538.html>
>> Sent from the Apache Spark User List mailing list archive
>> <http://apache-spark-user-list.1001560.n3.nabble.com/> at Nabble.com.
>>
>>
>>

Mime
View raw message