mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pat Ferrel <...@occamsmachete.com>
Subject Re: Mahout on the cloud
Date Thu, 23 Jul 2015 16:33:14 GMT
Just to be clear, mahout runs on AWS just fine. Dmitriy is talking about support and continuance
of “MapReduce” which means Hadoop MapReduce. We have been exclusively accepting only more
modern engine code for more than a year so most of the modern Mahout is in Scala and runs
on Spark. The MapReduce paradigm is certainly supported there but it runs on Spark so any
EMR instances you create should have Spark installed.

Amazon now supports Spark on EMR: https://aws.amazon.com/blogs/aws/new-apache-spark-on-amazon-emr/

Make sure you use the correct version of Spark with Mahout. 0.10.0 supports Spark 1.1.1 or
less, Mahout 0.10.1 supports Spark 1.2.1 or less, the current master snapshot supports Spark
1.3 and runs on Spark 1.4.

On Jul 23, 2015, at 7:28 AM, Ankit Goel <ankitgoel2004@gmail.com> wrote:

Thanks for the heads up Dmitriy..thats exactly the kind of warning I was
looking for. I dont have any experience implementing MR yet --i understand
the algo perfectly-- so this is a great heads up. Any advice oor warnings
on hadoop installations and versions??

On Thu, Jul 23, 2015 at 6:34 AM, Dmitriy Lyubimov <dlieu.7@gmail.com> wrote:

> MapReduce things enter de-facto end-of-life. Not that we specifically don't
> want to support them, it is de-facto nobody bothers to support them --
> especially risks are high with new versions of hadoop and EMR.
> 
> That said, we'd be grateful for any guide about doing this in EMR.
> 
> On Wed, Jul 22, 2015 at 5:53 PM, Ankit Goel <ankitgoel2004@gmail.com>
> wrote:
> 
>> Hi,
>> After my runs on my lappy, I'm ready to port my work to the cloud.
> Planning
>> to use Amazon. One thing I noticed when I started with mahout, that there
>> were a lot of things unsaid on the site/wiki and took me a lot of time to
>> figure out. Pitfalls if I may call them. I will primarily be using
>> clustering on the cloud, so the code to accept new data and run it is
> what
>> I have for now.
>> 
>> So before I port to the cloud, are there any things I should beware of or
>> lookout for? Like is AWS fine with mahout? Are there any configurations I
>> should remember? Any advice on implementation to ease my transition and
> run
>> mahout 24hrs? Thanks
>> 
>> --
>> Regards,
>> Ankit Goel
>> http://about.me/ankitgoel
>> 
> 



-- 
Regards,
Ankit Goel
http://about.me/ankitgoel


Mime
View raw message