mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From P Kal <ruvi...@gmail.com>
Subject Kmeans - clustering help
Date Fri, 06 Sep 2013 18:05:20 GMT
I'm trying to a kmeans clustering on only numeric data

This is how my data looks
header1, header2 header3, header4, header5
0,0,0,0,0
1,3,2,4,5
3,2,4,5,6
.
.
.

about 3000 rows

As the cluster centroids I created another file
(0,0,0,0,0)
(1,2,3,4,5)

My understanding is that we'd have to change these text files to sequence
files and then generate sparse vectors from this sequence file for kmeans
clustering

I've used the seqdirectory followed by seq2sparse,
and at the end I have two folders, one for input and one for centroids

Input folder has dirs generated by seq2sparse on the input sequence file
Similarly the centroids folder has dirs generated by seq2sparse on the
centroids sequence file
The command I use to run kmeans

mahout kmeans --input input/tfidf-vectors --output output -c
centroids/tfidf-vectors --maxIter 20
and I get this error

No input clusters found in centroids/tfidf-vectors Check your -c argument.

The sequence files have data but the files generated by seq2sparse do not
have any contents.
Can someone please help.

BTW all this on hdfs and not local mode

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message