mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Eastman <>
Subject Re: Mahout & Hadoop
Date Sat, 02 Oct 2010 17:10:35 GMT
  On 10/2/10 11:46 AM, Latency Buster wrote:
>> What did you want to do with Mahout?  How much data do you have?
>> There are many capabilities that don't use Hadoop, some that require it.
>>   Others allow you to choose to use
>> Hadoop only when you need to scale to large volumes.
> I have around 50GB data and need to do some data mining.. I do not
> need realtime like performance and can live with slow performance...
> Can I assume that Hadoop is a 'not required' item in my case?
> Thanks,
It depends upon what sort of data mining you want to do. FPGrowth and 
most of the clustering jobs have sequential operation as an option. If 
you have a multicore machine you may see performance improvements using 
Hadoop even on a single box. Some of the Mahout jobs only run on Hadoop. 
Its not that hard to bring up on a single machine. If you can borrow 
some cycles and disk space on other machines (I've been successful 
running Hadoop in the background on others' dev machines that were not 
heavily loaded while they were being used in the foreground for normal 
builds, etc.), it's pretty exciting to see the performance scale almost 
linearly with cores :)

  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message