mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: Job failed when running CanopyDriver on large dataset
Date Tue, 08 Dec 2009 20:52:04 GMT
Is there an exception somewhere (check the Hadoop lower level logs)?  What version of Mahout
are you on?  One of my concerns is that AEM is on Hadoop 0.18.3 (I think) and Mahout is on
a later version.

On Dec 8, 2009, at 1:45 PM, Liang Chenmin wrote:

> Hi,
>   I am running clustering using Mahout on Amazon Elastic Mapreduce. The
> canopy clustering step failed at some point. I am using 8 instances of
> m1.xlarge.  The machine that Amazon used for xlarge instance is configured
> as follow:
> 
>      Extra Large Instance 15 GB of memory, 8 EC2 Compute Units (4 virtual
> cores with 2 EC2 Compute Units each), 1690 GB of    instance storage, 64-bit
> platform
> 
> The step that cause the erros is the Canopy clustering step:
>    2009-12-08 09:52:03,057 INFO
> org.apache.mahout.clustering.canopy.CanopyDriver (main): Input:
> s3://mahout-output/xMDQYC
> bDtc/data Out: s3://mahout-output/xMDQYCbDtc/canopies Measure:
> org.apache.mahout.common.distance.EuclideanDistanceMeas
> ure t1: 80.0 t2: 55.0 Vector Class: SparseVector
> 
> And the last few lines of the syslog is as follow:
> 
>   2009-12-08 09:52:03,196 WARN org.apache.hadoop.mapred.JobClient (main):
> Use GenericOptionsParser for parsing the arguments. Applicat
> ions should implement Tool for the same.
> 2009-12-08 09:52:04,014 INFO org.apache.hadoop.mapred.FileInputFormat
> (main): Total input paths to process : 105
> 2009-12-08 09:52:04,222 INFO org.apache.hadoop.mapred.FileInputFormat
> (main): Total input paths to process : 105
> 2009-12-08 09:52:07,301 INFO org.apache.hadoop.mapred.JobClient (main):
> Running job: job_200912080939_0002
> 2009-12-08 09:52:08,304 INFO org.apache.hadoop.mapred.JobClient (main):  map
> 0% reduce 0%
> 2009-12-08 09:52:17,331 INFO org.apache.hadoop.mapred.JobClient (main):  map
> 2% reduce 0%
> 2009-12-08 09:52:18,335 INFO org.apache.hadoop.mapred.JobClient (main):  map
> 5% reduce 0%
> 2009-12-08 09:52:20,340 INFO org.apache.hadoop.mapred.JobClient (main):  map
> 12% reduce 0%
> 2009-12-08 09:52:21,343 INFO org.apache.hadoop.mapred.JobClient (main):  map
> 13% reduce 0%
> 2009-12-08 09:52:22,347 INFO org.apache.hadoop.mapred.JobClient (main):  map
> 17% reduce 0%
> 2009-12-08 09:52:24,363 INFO org.apache.hadoop.mapred.JobClient (main):  map
> 18% reduce 0%
> 2009-12-08 09:52:25,367 INFO org.apache.hadoop.mapred.JobClient (main):  map
> 25% reduce 0%
> 2009-12-08 09:52:26,371 INFO org.apache.hadoop.mapred.JobClient (main):  map
> 28% reduce 0%
> 2009-12-08 09:52:27,374 INFO org.apache.hadoop.mapred.JobClient (main):  map
> 31% reduce 0%
> 2009-12-08 09:52:28,377 INFO org.apache.hadoop.mapred.JobClient (main):  map
> 35% reduce 0%
> 2009-12-08 09:52:29,380 INFO org.apache.hadoop.mapred.JobClient (main):  map
> 36% reduce 0%
> 2009-12-08 09:52:30,383 INFO org.apache.hadoop.mapred.JobClient (main):  map
> 39% reduce 0%
> 2009-12-08 09:52:31,386 INFO org.apache.hadoop.mapred.JobClient (main):  map
> 43% reduce 0%
> 2009-12-08 09:52:32,388 INFO org.apache.hadoop.mapred.JobClient (main):  map
> 45% reduce 0%
> 2009-12-08 09:52:33,392 INFO org.apache.hadoop.mapred.JobClient (main):  map
> 57% reduce 0%
> 2009-12-08 09:52:34,395 INFO org.apache.hadoop.mapred.JobClient (main):  map
> 62% reduce 0%
> 2009-12-08 09:52:35,399 INFO org.apache.hadoop.mapred.JobClient (main):  map
> 69% reduce 0%
> 2009-12-08 09:52:36,409 INFO org.apache.hadoop.mapred.JobClient (main):  map
> 79% reduce 0%
> 2009-12-08 09:52:37,413 INFO org.apache.hadoop.mapred.JobClient (main):  map
> 82% reduce 0%
> 2009-12-08 09:52:38,417 INFO org.apache.hadoop.mapred.JobClient (main):  map
> 90% reduce 0%
> 2009-12-08 09:52:39,420 INFO org.apache.hadoop.mapred.JobClient (main):  map
> 99% reduce 0%
> 2009-12-08 09:52:42,432 INFO org.apache.hadoop.mapred.JobClient (main): Task
> Id : attempt_200912080939_0002_m_000104_0, Status : FAI
> LED
> 2009-12-08 09:52:43,531 INFO org.apache.hadoop.mapred.JobClient (main):  map
> 99% reduce 6%
> 2009-12-08 09:52:48,544 INFO org.apache.hadoop.mapred.JobClient (main):  map
> 99% reduce 13%
> 2009-12-08 09:52:48,544 INFO org.apache.hadoop.mapred.JobClient (main): Task
> Id : attempt_200912080939_0002_m_000104_1, Status : FAI
> LED
> 2009-12-08 09:52:53,564 INFO org.apache.hadoop.mapred.JobClient (main):  map
> 99% reduce 15%
> 2009-12-08 09:52:54,567 INFO org.apache.hadoop.mapred.JobClient (main): Task
> Id : attempt_200912080939_0002_m_000104_2, Status : FAI
> LED
> 2009-12-08 09:52:58,605 INFO org.apache.hadoop.mapred.JobClient (main):  map
> 99% reduce 22%
> 
> I am a newbie to Hadoop and mahout, and I am seeking some help here. Seems
> that some of the map reduce job fails. Is it because the file size is too
> big? Or there are too many input paths?
> 
> Thanks!

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene:
http://www.lucidimagination.com/search


Mime
View raw message