mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pat Ferrel <...@occamsmachete.com>
Subject Re: Problem using SNAPSHOT kmeans
Date Mon, 04 Jun 2012 20:40:47 GMT
Hmm, switched back to mahout 0.6 and the same command line produced the 
expected results with the same data. No error. Can't find anything on JIRA.

Is anyone else using kmeans from the trunk on real data?

On 6/4/12 9:05 AM, Pat Ferrel wrote:
> Using the CLI to kmeans from several trunk versions I get an error I 
> don't understand.  When the job died the 
> b3/canopy-centroids/clusters-0-final contained the random-seeds file 
> generated by the kmeans driver and the b3/kmeans-clusters/clusters-0 
> had several part files but b3/kmeans-clusters/clusters-1 was empty. 
> When I look through the code from the trace it doesn't make much sense.
>
> Command line:
> mahout kmeans
>   -i b3/vectors/tfidf-vectors/
>   -k 20
>   -c b3/canopy-centroids/clusters-0-final
>   -cl
>   -o b3/kmeans-clusters
>   -ow
>   -cd 0.01
>   -x 30
>   -dm org.apache.mahout.common.distance.CosineDistanceMeasure
>
> Error:
> 12/06/04 07:55:03 INFO common.AbstractJob: Command line arguments: 
> {--clustering=null, --clusters=[b3/canopy-centroids/clusters-0-final], 
> --convergenceDelta=[0.01], 
> --distanceMeasure=[org.apache.mahout.common.distance.CosineDistanceMeasure], 
> --endPhase=[2147483647], --input=[b3/vectors/tfidf-vectors/], 
> --maxIter=[30], --method=[mapreduce], --numClusters=[20], 
> --output=[b3/kmeans-clusters], --overwrite=null, --startPhase=[0], 
> --tempDir=[temp]}
> 2012-06-04 07:55:03.752 java[67308:1903] Unable to load realm info 
> from SCDynamicStore
> 12/06/04 07:55:03 INFO common.HadoopUtil: Deleting 
> b3/canopy-centroids/clusters-0-final
> 12/06/04 07:55:04 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes 
> where applicable
> 12/06/04 07:55:04 INFO compress.CodecPool: Got brand-new compressor
> 12/06/04 07:55:04 INFO kmeans.RandomSeedGenerator: Wrote 20 vectors to 
> b3/canopy-centroids/clusters-0-final/part-randomSeed
> 12/06/04 07:55:04 INFO kmeans.KMeansDriver: Input: 
> b3/vectors/tfidf-vectors Clusters In: 
> b3/canopy-centroids/clusters-0-final/part-randomSeed Out: 
> b3/kmeans-clusters Distance: 
> org.apache.mahout.common.distance.CosineDistanceMeasure
> 12/06/04 07:55:04 INFO kmeans.KMeansDriver: convergence: 0.01 max 
> Iterations: 30 num Reduce Tasks: org.apache.mahout.math.VectorWritable 
> Input Vectors: {}
> 12/06/04 07:55:04 INFO compress.CodecPool: Got brand-new decompressor
> Cluster Iterator running iteration 1 over priorPath: 
> b3/kmeans-clusters/clusters-0
> 12/06/04 07:55:05 INFO input.FileInputFormat: Total input paths to 
> process : 1
> 12/06/04 07:55:05 INFO mapred.JobClient: Running job: job_local_0001
> 12/06/04 07:55:06 INFO mapred.MapTask: io.sort.mb = 100
> 12/06/04 07:55:08 INFO mapred.MapTask: data buffer = 79691776/99614720
> 12/06/04 07:55:08 INFO mapred.MapTask: record buffer = 262144/327680
> 12/06/04 07:55:08 INFO mapred.JobClient:  map 0% reduce 0%
> 12/06/04 07:55:09 WARN mapred.LocalJobRunner: job_local_0001
> org.apache.mahout.math.IndexException: Index -1 is outside allowable 
> range of [0,20)
>     at org.apache.mahout.math.AbstractVector.set(AbstractVector.java:439)
>     at 
> org.apache.mahout.clustering.iterator.AbstractClusteringPolicy.select(AbstractClusteringPolicy.java:44)
>     at 
> org.apache.mahout.clustering.iterator.CIMapper.map(CIMapper.java:52)
>     at 
> org.apache.mahout.clustering.iterator.CIMapper.map(CIMapper.java:18)
>     at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>     at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>     at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
> 12/06/04 07:55:09 INFO mapred.JobClient: Job complete: job_local_0001
> 12/06/04 07:55:09 INFO mapred.JobClient: Counters: 0
> Exception in thread "main" java.lang.InterruptedException: Cluster 
> Iteration 1 failed processing b3/kmeans-clusters/clusters-1
>     at 
> org.apache.mahout.clustering.iterator.ClusterIterator.iterateMR(ClusterIterator.java:186)
>     at 
> org.apache.mahout.clustering.kmeans.KMeansDriver.buildClusters(KMeansDriver.java:229)
>     at 
> org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:149)
>     at 
> org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:108)
>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>     at 
> org.apache.mahout.clustering.kmeans.KMeansDriver.main(KMeansDriver.java:49)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)
>     at 
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>     at 
> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>     at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
>
>
>

Mime
View raw message