mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephen Green <>
Subject Re: Amazon Mahout Public AMI v. Mahout on EMR
Date Tue, 19 May 2009 13:12:53 GMT

On May 19, 2009, at 7:11 AM, Grant Ingersoll wrote:

> On May 19, 2009, at 6:59 AM, Tim Bass wrote:
>> Dear All,
>> A few months ago (on the developer's list) we briefly touched on the
>> idea of building a Mahout public AMI on EC2.
>> Subsequently, Amazon released EMR and a number of folks have
>> experimented with running sample Mahout jobs on EMR.
>> What are the pros and cons of creating a public Mahout AMI with  
>> Hadoop
>> and MapReduce configured with the versions that
>> are supported by the developers, in addition to Amazon's EMR  
>> implementation?
> AFAICT, one issue seems to be that EMR locks you into a specific  
> Hadoop instance.  Not sure if "locks" is too strong, maybe I should  
> say it "encourages" you to use a specific version?

Actually, I think "locks" is more appropriate.  They're using Hadoop  
0.18.3 with some feature backports (according to what they said to  
me), so if you want features from a newer Hadoop (isn't 0.20 the  
current release?  It looked like it had a lot of new stuff), you're  
pretty much done for.

Also, they charge extra for EMR jobs, which strikes me as a bit crazy  
(see Greg Linden's comments about variable pricing), and may strike  
some folks as a reason to run their own clusters.

> As Ted and others pointed out, I think we would benefit from tools  
> that make it easy to add Mahout to an AMI.

Perhaps you could base it off of one of the Cloudera Hadoop AMIs?   
They're publically available, and they handle all the Hadoop  
business.  I have no idea what the redistribution license would be,  
and I am most definitely not a lawyer!

Stephen Green                      //
Principal Investigator             \\
Aura Project                       //   Voice: +1 781-442-0926
Sun Microsystems Labs              \\   Fax:   +1 781-442-1692

View raw message