mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 许春玲 <x...@sari.ac.cn>
Subject Re: Re: 40 hours to run 1/2 Netflix Data?
Date Mon, 14 May 2012 08:08:45 GMT
Ted,
Yes, Memory per node is only 16G.Usage of Memory cached is 100% as attached file show. And
CPU is 100% too.
And Max size of local disk hadoop temp is 160G, and it will be used 100% .
It like that key point is the Sixth step of recommonder, for every time job fail at this step.

I have several tests, the log as attached. From firt time to fifth time,I cut the size 1/2(like
below) and every time the job fail at Sixth step. Even when I cut the data size to about 100M
as large of
groupLen movie rating file, it still fail(btw, I run 100M groupLen Movie rating cost about
16 Minutes)
-rw-r--r--   3 hdfs supergroup 1505255088 2012-04-20 16:43 /user/hdfs/NetFlix_data
-rw-r--r--   3 hdfs supergroup 1058793314 2012-04-24 10:45 /user/hdfs/netFlixData2
-rw-r--r--   3 hdfs supergroup  793294103 2012-04-26 08:59 /user/hdfs/netFlixData3
-rw-r--r--   3 hdfs supergroup  476054038 2012-04-27 09:51 /user/hdfs/netFlixData4
-rw-r--r--   3 hdfs supergroup  135210043 2012-04-28 13:53 /user/hdfs/netFlixData6

So, I think cut userId to 1/2 to reduce the size of Matrix. When I do this, recommendor finished,
but it take about 40 hours.
and the mapred conf of my cluster is:

<property>
  <name>mapred.tasktracker.map.tasks.maximum</name>
  <value>7</value>
</property>
<property>
  <name>mapred.tasktracker.reduce.tasks.maximum</name>
  <value>7</value>
</property>

<property>
  <name>mapred.map.child.java.opts</name>
  <value>-Xmx512M</value>
</property>
<property>
  <name>mapred.reduce.child.java.opts</name>
  <value>-Xmx512M</value>
</property>
<property>
  <name>mapred.child.ulimit</name>
  <value>-Xmx600M</value>
</property> 


> -----原始邮件-----
> 发件人: "Ted Dunning" <ted.dunning@gmail.com>
> 发送时间: 2012年5月14日 星期一
> 收件人: user@mahout.apache.org, ssc@apache.org
> 抄送: 
> 主题: Re: 40 hours to run 1/2 Netflix Data?
> 
> 许春玲,
> 
> The nodes here are relatively under-provisioned with respect to memory.
>  Current standard practice is to use provide 4-6 GB per core.  These
> machines have half to a third that much memory.  As a result, it is pretty
> easy to cause swapping if you have too many map or reduce slots configured
> on these machines.  That would be my first suspicion.
> 
> A second worry is that you apparently only have a single disk per node.
>  This will substantially slow down your processing.  Even normal Hadoop can
> move 300 MB/s/node with more drives and optimized systems like MapR can
> move more than 1GB/s/node.  With a single drive, you are going to be
> severely limited in terms of I/O bandwidth.
> 
> Additionally, any swapping that you are doing is going to eat away even
> further.
> 
> Have you looked at your swap rates, I/O rates, network rates and CPU usage
> during the execution of this program?
> 
> On Sun, May 13, 2012 at 10:44 PM, Sebastian Schelter <ssc@apache.org> wrote:
> 
> > Hi,
> >
> > something must be completely going wrong in this experiment. Please use
> > the latest version of Mahout (Mahout 0.6) and tell us exactly at which
> > point the job fails.
> >
> > I have been able to process datasets seven times as large as Netflix
> > (http://webscope.sandbox.yahoo.com/catalog.php?datatype=r) in a few
> > hours on a 6 machine cluster.
> >
> > --sebastian
> >
> > On 14.05.2012 03:44, 许春玲 wrote:
> > > Hi,
> > >
> > >    I run item recommemder base on Netflix, but it always fail for not
> > > enough local disk space. So, I cut the User Id to half(not user account
> > but user Id),to reduce the temp data. Now, it finish but
> > > take 40 hours. The command like follow:
> > >
> > > hadoop jar
> > /app/mahout-distribution-0.5/core/target/mahout-core-0.5-job.jar
> > org.apache.mahout.cf.taste.hadoop.item.RecommenderJob
> > -Dmapred.map.tasks=196 -Dmapred.reduce.tasks=196
> > -Dmapred.input.dir=NetFlix_data_new -Dmapred.output.dir=output_netflix8
> > >
> > > my hadoop cluster:
> > >
> > > 28 nodes
> > > 16G memory per node
> > > 8 core per node
> > > 250G local disk per node
> > >
> > >
> > >
> > >
> >
> >





Mime
  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message