mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paritosh Ranjan <pran...@xebia.com>
Subject Re: Mahout Kmeans
Date Sat, 08 Sep 2012 05:35:17 GMT
Can you open up a jira describing the problem and submitting the patch 
for your fix?
https://issues.apache.org/jira/browse/MAHOUT

On 08-09-2012 09:40, Gustavo Enrique Salazar Torres wrote:
> Nevermind, got it to work, had to fix the script though.
>
> Thanks.
> Gustavo
>
> On Fri, Sep 7, 2012 at 5:54 PM, Gustavo Enrique Salazar Torres <
> gsalazar@ime.usp.br> wrote:
>
>> Hi there:
>>
>> I'm trying to finish an improvement to the Kmeans algorithm but I first
>> need to get it run in order to compare results.
>> But running the cluster-reuters.sh script I get this error:
>>
>> MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
>> Running on hadoop, using /home/gustavo/Desktop/yandex_data/hadoop-
>> 0.20.203.0/bin/hadoop and
>> HADOOP_CONF_DIR=/home/gustavo/Desktop/yandex_data/hadoop-0.20.203.0/conf
>> MAHOUT-JOB:
>> /home/gustavo/Desktop/yandex_data/mahout-distribution-0.7/mahout-examples-0.7-job.jar
>> 12/09/07 17:47:43 INFO common.AbstractJob: Command line arguments:
>> {--clustering=null, --clusters=[./reuters-kmeans-clusters],
>> --convergenceDelta=[0.5],
>> --distanceMeasure=[org.apache.mahout.common.distance.CosineDistanceMeasure],
>> --endPhase=[2147483647],
>> --input=[./reuters_out_seqdir_kmeans/tfidf-vectors], --maxIter=[10],
>> --method=[mapreduce], --numClusters=[20], --output=[./reuters-kmeans],
>> --overwrite=null, --startPhase=[0], --tempDir=[temp]}
>> 12/09/07 17:47:44 INFO common.HadoopUtil: Deleting reuters-kmeans-clusters
>> 12/09/07 17:47:44 INFO util.NativeCodeLoader: Loaded the native-hadoop
>> library
>> 12/09/07 17:47:44 INFO zlib.ZlibFactory: Successfully loaded & initialized
>> native-zlib library
>> 12/09/07 17:47:44 INFO compress.CodecPool: Got brand-new compressor
>> 12/09/07 17:47:44 INFO kmeans.RandomSeedGenerator: Wrote 20 Klusters to
>> reuters-kmeans-clusters/part-randomSeed
>> 12/09/07 17:47:44 INFO kmeans.KMeansDriver: Input:
>> reuters_out_seqdir_kmeans/tfidf-vectors Clusters In:
>> reuters-kmeans-clusters/part-randomSeed Out: reuters-kmeans Distance:
>> org.apache.mahout.common.distance.CosineDistanceMeasure
>> 12/09/07 17:47:44 INFO kmeans.KMeansDriver: convergence: 0.5 max
>> Iterations: 10 num Reduce Tasks: org.apache.mahout.math.VectorWritable
>> Input Vectors: {}
>> 12/09/07 17:47:44 INFO compress.CodecPool: Got brand-new decompressor
>> Exception in thread "main" java.lang.IllegalStateException: No input
>> clusters found in reuters-kmeans-clusters/part-randomSeed. Check your -c
>> argument.
>> at
>> org.apache.mahout.clustering.kmeans.KMeansDriver.buildClusters(KMeansDriver.java:218)
>>
>> As you can see the initial clusters are being created but for a reason I
>> don't understand why they are being found.
>> Below is the 'cat' command on the part file containing clusters:
>>
>> $ dfs -cat reuters-kmeans-clusters/part-randomSeed
>> SEQ
>> org.apache.hadoop.io.Text5org.apache.mahout.clustering.iterator.ClusterWritable
>> *org.apache.hadoop.io.compress.DefaultCodec b�W3 K�E�߇H��Vgustavo
>>
>> Can anyone help me please?
>>
>> Thanks
>> Gustavo Salazar
>>



Mime
View raw message