mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sebastian Schelter <...@apache.org>
Subject Re: 40 hours to run 1/2 Netflix Data?
Date Mon, 14 May 2012 05:44:10 GMT
Hi,

something must be completely going wrong in this experiment. Please use
the latest version of Mahout (Mahout 0.6) and tell us exactly at which
point the job fails.

I have been able to process datasets seven times as large as Netflix
(http://webscope.sandbox.yahoo.com/catalog.php?datatype=r) in a few
hours on a 6 machine cluster.

--sebastian

On 14.05.2012 03:44, 许春玲 wrote:
> Hi,
> 
>    I run item recommemder base on Netflix, but it always fail for not
> enough local disk space. So, I cut the User Id to half(not user account but user Id),to
reduce the temp data. Now, it finish but 
> take 40 hours. The command like follow:
> 
> hadoop jar /app/mahout-distribution-0.5/core/target/mahout-core-0.5-job.jar org.apache.mahout.cf.taste.hadoop.item.RecommenderJob
-Dmapred.map.tasks=196 -Dmapred.reduce.tasks=196 -Dmapred.input.dir=NetFlix_data_new -Dmapred.output.dir=output_netflix8
> 
> my hadoop cluster:
> 
> 28 nodes
> 16G memory per node
> 8 core per node
> 250G local disk per node
> 
> 
> 
> 


Mime
View raw message