mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Fuhrmann Alpert, Galit" <>
Subject mahout kmeans not generating clusteredPoint dir?
Date Wed, 10 Jul 2013 09:17:46 GMT

I ran mahout kmeans (using rand seeds) on hadoop cluster. It ran successfully and created
a directory containing clusters-*, including the last which was clusters-3-final.
However, it did not create the clusteredPoints, or at least I cannot find it under the same
dir (or anywhere else).

My call was:
mahout kmeans  -k 4000 -i inputSeq.dat -o outputPath --maxIter 3 --clusters outputSeeds

Was there an extra argument I needed to specify in order for it to generate the clusteredPoints?
(BTW I also can't see the outputSeeds. Was it created for seeds and then deleted?)

According to mahout in action:

The k-means clustering implementation creates two types of directories in the output
folder. The clusters-* directories are formed at the end of each iteration: the clusters-0
directory is generated after the first iteration, clusters-1 after the second iteration, and
so on. These directories contain information about the clusters: centroid, standard
deviation, and so on. The clusteredPoints directory, on the other hand, contains the
final mapping from cluster ID to document ID. This data is generated from the output
of the last MapReduce operation.
 The directory listing of the output folder looks something like this:
$ ls -l reuters-kmeans-clusters
drwxr-xr-x 4 user 5000 136 Feb 1 18:56 clusters-0
drwxr-xr-x 4 user 5000 136 Feb 1 18:56 clusters-1
drwxr-xr-x 4 user 5000 136 Feb 1 18:56 clusters-2
drwxr-xr-x 4 user 5000 136 Feb 1 18:59 clusteredPoint

Again, my call did not generate the clusteredPoint directory.
I would appreciate your help.

Thanks a lot,


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message