mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <>
Subject Re: Clustering from DB
Date Wed, 15 Jul 2009 22:04:05 GMT
Very cool!  Would love to hear more if you can share.  Getting use  
cases and powered by info out to the public is one of the key things  
we can do to drive adoption and increase Mahout's capabilities.

On Jul 15, 2009, at 5:46 PM, zaki rahaman wrote:

> I'm still prototyping something to make sure it works before I start  
> working
> on rolling it out for a large (~500GB) backlog of server data that I  
> want to
> work with. As such, I haven't looked seriously into using EC2 until  
> the test
> runs work well, but plan on doing so in the next couple days. I'd be  
> more
> than happy to write a script to run a Job or work on a mahout AMI  
> config.
> On Wed, Jul 15, 2009 at 5:40 PM, Grant Ingersoll  
> <>wrote:
>> On Jul 15, 2009, at 5:25 PM, zaki rahaman wrote:
>> I hope I'm understanding your setup correctly but by running on one
>>> machine,
>>> you're not fully exploiting the capabilities of Hadoop's Map/ 
>>> Reduce. Gains
>>> in computation time will only be seen by increasing the number of  
>>> cores or
>>> nodes.
>> Yep.
>> If you need access to more computing power, you might want to
>>> consider using Amazon's EC2 (they have preconfigured AMIs for  
>>> Hadoop but
>>> youd have to configure and install Mahout, a process which I'm not  
>>> totally
>>> familiar with as of yet as I'm still trying to do it myself).
>> Please add to if you  
>> can.
>> Given a Hadoop AMI, it shouldn't be all that hard to setup a Job, I
>> wouldn't think.  Would be good to have a script that does it, though.
>> -Grant
> -- 
> Zaki Rahaman

Grant Ingersoll

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

View raw message