mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: Error with KMeans example in trunk (793689)
Date Tue, 14 Jul 2009 11:58:52 GMT
Try Hadoop 0.20.0, which is what trunk is now on.  I will update the  
docs.


On Jul 13, 2009, at 7:02 PM, Paul Ingles wrote:

> Hi,
>
> I've been going over the kmeans stuff the last few days to try and  
> understand how it works, and how I might extend it to work with the  
> data I'm looking to process. It's taken me a while to get a basic  
> understanding of things, and really appreciate having lists like  
> this around for support.
>
> I need to be able to label the vectors: each vector holds (for a  
> document) a set of similarity scores across a number of attributes.  
> I did some searching around payloads (after coming across the term  
> in some comments) but couldn't see how I add a payload to the  
> Vector. I then stumbled on MAHOUT-65 (https://issues.apache.org/jira/browse/MAHOUT-65

> ) that mentions the addition of the setName method to Vector. I've  
> tried building trunk, and although there were a few test failures  
> for other (seemingly unrelated) examples I continued and managed to  
> get the mahout-examples jar/job files built to give it a whirl.
>
> When I run the following:
>
> $ hadoop jar examples/target/mahout-examples-0.2-SNAPSHOT.job  
> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
>
> I see it run the "Preparing Input", "Running Canopy to get initial  
> clusters", and then finally it starts "Running KMeans". But, shortly  
> after it breaks with the following trace:
>
> ---snip---
> Running KMeans
> 09/07/13 23:49:34 INFO kmeans.KMeansDriver: Input: output/data  
> Clusters In: output/canopies Out: output Distance:  
> org.apache.mahout.utils.EuclideanDistanceMeasure
> 09/07/13 23:49:34 INFO kmeans.KMeansDriver: convergence: 0.5 max  
> Iterations: 10 num Reduce Tasks: 1 Input Vectors:  
> org.apache.mahout.matrix.SparseVector
> 09/07/13 23:49:34 INFO kmeans.KMeansDriver: Iteration 0
> 09/07/13 23:49:34 WARN mapred.JobClient: Use GenericOptionsParser  
> for parsing the arguments. Applications should implement Tool for  
> the same.
> 09/07/13 23:49:34 INFO mapred.FileInputFormat: Total input paths to  
> process : 2
> 09/07/13 23:49:34 INFO mapred.JobClient: Running job:  
> job_200907132019_0040
> 09/07/13 23:49:35 INFO mapred.JobClient:  map 0% reduce 0%
> 09/07/13 23:49:42 INFO mapred.JobClient:  map 50% reduce 0%
> 09/07/13 23:49:43 INFO mapred.JobClient:  map 100% reduce 0%
> 09/07/13 23:49:49 INFO mapred.JobClient:  map 100% reduce 100%
> 09/07/13 23:49:50 INFO mapred.JobClient: Job complete:  
> job_200907132019_0040
> 09/07/13 23:49:50 INFO mapred.JobClient: Counters: 16
> 09/07/13 23:49:50 INFO mapred.JobClient:   File Systems
> 09/07/13 23:49:50 INFO mapred.JobClient:     HDFS bytes read=465629
> 09/07/13 23:49:50 INFO mapred.JobClient:     HDFS bytes written=5631
> 09/07/13 23:49:50 INFO mapred.JobClient:     Local bytes read=7806
> 09/07/13 23:49:50 INFO mapred.JobClient:     Local bytes written=15674
> 09/07/13 23:49:50 INFO mapred.JobClient:   Job Counters
> 09/07/13 23:49:50 INFO mapred.JobClient:     Launched reduce tasks=1
> 09/07/13 23:49:50 INFO mapred.JobClient:     Launched map tasks=2
> 09/07/13 23:49:50 INFO mapred.JobClient:     Data-local map tasks=2
> 09/07/13 23:49:50 INFO mapred.JobClient:   Map-Reduce Framework
> 09/07/13 23:49:50 INFO mapred.JobClient:     Reduce input groups=7
> 09/07/13 23:49:50 INFO mapred.JobClient:     Combine output records=10
> 09/07/13 23:49:50 INFO mapred.JobClient:     Map input records=600
> 09/07/13 23:49:50 INFO mapred.JobClient:     Reduce output records=7
> 09/07/13 23:49:50 INFO mapred.JobClient:     Map output bytes=465600
> 09/07/13 23:49:50 INFO mapred.JobClient:     Map input bytes=448580
> 09/07/13 23:49:50 INFO mapred.JobClient:     Combine input records=600
> 09/07/13 23:49:50 INFO mapred.JobClient:     Map output records=600
> 09/07/13 23:49:50 INFO mapred.JobClient:     Reduce input records=10
> 09/07/13 23:49:50 WARN kmeans.KMeansDriver: java.io.IOException:  
> Cannot open filename /user/paul/output/clusters-0/_logs
> java.io.IOException: Cannot open filename /user/paul/output/ 
> clusters-0/_logs
> 	at org.apache.hadoop.hdfs.DFSClient 
> $DFSInputStream.openInfo(DFSClient.java:1394)
> 	at org.apache.hadoop.hdfs.DFSClient 
> $DFSInputStream.<init>(DFSClient.java:1385)
> 	at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:338)
> 	at  
> org 
> .apache 
> .hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java: 
> 171)
> 	at org.apache.hadoop.io.SequenceFile 
> $Reader.openFile(SequenceFile.java:1437)
> 	at org.apache.hadoop.io.SequenceFile 
> $Reader.<init>(SequenceFile.java:1424)
> 	at org.apache.hadoop.io.SequenceFile 
> $Reader.<init>(SequenceFile.java:1417)
> 	at org.apache.hadoop.io.SequenceFile 
> $Reader.<init>(SequenceFile.java:1412)
> 	at  
> org 
> .apache 
> .mahout.clustering.kmeans.KMeansDriver.isConverged(KMeansDriver.java: 
> 304)
> 	at  
> org 
> .apache 
> .mahout 
> .clustering.kmeans.KMeansDriver.runIteration(KMeansDriver.java:241)
> 	at  
> org 
> .apache 
> .mahout.clustering.kmeans.KMeansDriver.runJob(KMeansDriver.java:194)
> 	at  
> org 
> .apache 
> .mahout.clustering.syntheticcontrol.kmeans.Job.runJob(Job.java:100)
> 	at  
> org 
> .apache.mahout.clustering.syntheticcontrol.kmeans.Job.main(Job.java: 
> 56)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at  
> sun 
> .reflect 
> .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at  
> sun 
> .reflect 
> .DelegatingMethodAccessorImpl 
> .invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
> 	at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> 	at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
> ---snip---
>
> This is against revision 793689, running on my development Mac Pro  
> (pseudo-distributed single node) with Hadoop 0.19.1.
>
> It's a bit late to be digging through what's going on, but will try  
> and take a look tomorrow- really excited about giving kmeans a whirl  
> on the document processing I'm playing with. In the meantime, I was  
> wondering whether anyone else had seen the same, or knew a way to  
> accomplish something similar with the released version (or point me  
> to a past good revision perhaps?)
>
> Thanks again,
> Paul

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:
http://www.lucidimagination.com/search


Mime
View raw message