spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Kelly <jonathaka...@gmail.com>
Subject Re: spark-ec2 vs. EMR
Date Sat, 05 Dec 2015 02:32:09 GMT
Sending this to the list again because I'm pretty sure it didn't work the
first time. A colleague just realized he was having the same problem with
the list not accepting his posts, but unsubscribing and re-subscribing
seemed to fix the issue for him. I've just unsubscribed and re-subscribed
too, so hopefully this works...

On Wednesday, December 2, 2015, Jonathan Kelly <jonathakamzn@gmail.com>
wrote:

> EMR is currently running a private preview of an upcoming feature allowing
> EMR clusters to be launched in VPC private subnets. This will allow you to
> launch a cluster in a subnet without an Internet Gateway attached. Please
> contact jonfritz@amazon.com
> <javascript:_e(%7B%7D,'cvml','jonfritz@amazon.com');> if you would like
> more information.
>
> ~ Jonathan
>
> Note: jonfritz@amazon.com
> <javascript:_e(%7B%7D,'cvml','jonfritz@amazon.com');> is not me. I'm a
> different Jonathan. :)
>
> On Wed, Dec 2, 2015 at 10:21 AM, Jerry Lam <chilinglam@gmail.com
> <javascript:_e(%7B%7D,'cvml','chilinglam@gmail.com');>> wrote:
>
>> Hi Dana,
>>
>> Yes, we get VPC + EMR working but I'm not the person who deploys it. It
>> is related to subnet as Alex points out.
>>
>> Just to want to add another point, spark-ec2 is nice to keep and improve
>> because it allows users to any version of spark (nightly-build for
>> example). EMR does not allow you to do that without manual process.
>>
>> Best Regards,
>>
>> Jerry
>>
>> On Wed, Dec 2, 2015 at 1:02 PM, Alexander Pivovarov <apivovarov@gmail.com
>> <javascript:_e(%7B%7D,'cvml','apivovarov@gmail.com');>> wrote:
>>
>>> Do you think it's a security issue if EMR started in VPC with a subnet
>>> having Auto-assign Public IP: Yes
>>>
>>> you can remove all Inbound rules having 0.0.0.0/0 Source in master and
>>> slave Security Group
>>> So, master and slave boxes will be accessible only for users who are on
>>> VPN
>>>
>>>
>>>
>>>
>>> On Wed, Dec 2, 2015 at 9:44 AM, Dana Powers <dana.powers@gmail.com
>>> <javascript:_e(%7B%7D,'cvml','dana.powers@gmail.com');>> wrote:
>>>
>>>> EMR was a pain to configure on a private VPC last I tried. Has anyone
>>>> had success with that? I found spark-ec2 easier to use w private
>>>> networking, but also agree that I would use for prod.
>>>>
>>>> -Dana
>>>> On Dec 1, 2015 12:29 PM, "Alexander Pivovarov" <apivovarov@gmail.com
>>>> <javascript:_e(%7B%7D,'cvml','apivovarov@gmail.com');>> wrote:
>>>>
>>>>> 1. Emr 4.2.0 has Zeppelin as an alternative to DataBricks Notebooks
>>>>>
>>>>> 2. Emr has Ganglia 3.6.0
>>>>>
>>>>> 3. Emr has hadoop fs settings to make s3 work fast
>>>>> (direct.EmrFileSystem)
>>>>>
>>>>> 4. EMR has s3 keys in hadoop configs
>>>>>
>>>>> 5. EMR allows to resize cluster on fly.
>>>>>
>>>>> 6. EMR has aws sdk in spark classpath. Helps to reduce app assembly
>>>>> jar size
>>>>>
>>>>> 7. ec2 script installs all in /root, EMR has dedicated users: hadoop,
>>>>> zeppelin, etc. EMR is similar to Cloudera or Hortonworks
>>>>>
>>>>> 8. There are at least 3 spark-ec2 projects. (in apache/spark, in
>>>>> mesos, in amplab). Master branch in spark has outdated ec2 script. Other
>>>>> projects have broken links in readme. WHAT A MESS!
>>>>>
>>>>> 9. ec2 script has bad documentation and non informative error
>>>>> messages. e.g. readme does not say anything about --private-ips option.
If
>>>>> you did not add the flag it will connect to empty string host (localhost)
>>>>> instead of master. Fixed only last week. Not sure if fixed in all branches
>>>>>
>>>>> 10. I think Amazon will include spark-jobserver to EMR soon.
>>>>>
>>>>> 11. You do not need to be aws expert to start EMR cluster. Users can
>>>>> use EMR web ui to start cluster to run some jobs or work in Zeppelun
during
>>>>> the day
>>>>>
>>>>> 12. EMR cluster starts in abour 8 min. Ec2 script works longer and you
>>>>> need to be online.
>>>>> On Dec 1, 2015 9:22 AM, "Jerry Lam" <chilinglam@gmail.com
>>>>> <javascript:_e(%7B%7D,'cvml','chilinglam@gmail.com');>> wrote:
>>>>>
>>>>>> Simply put:
>>>>>>
>>>>>> EMR = Hadoop Ecosystem (Yarn, HDFS, etc) + Spark + EMRFS + Amazon
EMR
>>>>>> API + Selected Instance Types + Amazon EC2 Friendly (bootstrapping)
>>>>>> spark-ec2 = HDFS + Yarn (Optional) + Spark (Standalone Default) +
Any
>>>>>> Instance Type
>>>>>>
>>>>>> I use spark-ec2 for prototyping and I have never use it for
>>>>>> production.
>>>>>>
>>>>>> just my $0.02
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Dec 1, 2015, at 11:15 AM, Nick Chammas <nicholas.chammas@gmail.com
>>>>>> <javascript:_e(%7B%7D,'cvml','nicholas.chammas@gmail.com');>>
wrote:
>>>>>>
>>>>>> Pinging this thread in case anyone has thoughts on the matter they
>>>>>> want to share.
>>>>>>
>>>>>> On Sat, Nov 21, 2015 at 11:32 AM Nicholas Chammas <[hidden email]>
>>>>>> wrote:
>>>>>>
>>>>>>> Spark has come bundled with spark-ec2
>>>>>>> <http://spark.apache.org/docs/latest/ec2-scripts.html>
for many
>>>>>>> years. At the same time, EMR has been capable of running Spark
for a while,
>>>>>>> and earlier this year it added "official" support
>>>>>>> <https://aws.amazon.com/blogs/aws/new-apache-spark-on-amazon-emr/>.
>>>>>>>
>>>>>>> If you're looking for a way to provision Spark clusters, there
are
>>>>>>> some clear differences between these 2 options. I think the biggest
one
>>>>>>> would be that EMR is a "production" solution backed by a company,
whereas
>>>>>>> spark-ec2 is not really intended for production use (as far as
I know).
>>>>>>>
>>>>>>> That particular difference in intended use may or may not matter
to
>>>>>>> you, but I'm curious:
>>>>>>>
>>>>>>> What are some of the other differences between the 2 that do
matter
>>>>>>> to you? If you were considering these 2 solutions for your use
case at one
>>>>>>> point recently, why did you choose one over the other?
>>>>>>>
>>>>>>> I'd be especially interested in hearing about why people might
>>>>>>> choose spark-ec2 over EMR, since the latter option seems to have
shaped up
>>>>>>> nicely this year.
>>>>>>>
>>>>>>> Nick
>>>>>>>
>>>>>>>
>>>>>> ------------------------------
>>>>>> View this message in context: Re: spark-ec2 vs. EMR
>>>>>> <http://apache-spark-user-list.1001560.n3.nabble.com/Re-spark-ec2-vs-EMR-tp25538.html>
>>>>>> Sent from the Apache Spark User List mailing list archive
>>>>>> <http://apache-spark-user-list.1001560.n3.nabble.com/> at Nabble.com.
>>>>>>
>>>>>>
>>>>>>
>>>
>>
>

Mime
View raw message