mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pat Ferrel <>
Subject Mapping clusteredPoints to clusters
Date Wed, 12 Sep 2012 01:33:36 GMT
Maybe I should reword this since it has nothing to do with SSVD.

When doing clustering and asking the driver to cluster the input vectors after the clusters
are computed it creates a file called clusteredPoints/part-m-xxx

In it are cluster IDs and input vector pairs (IntWritable, VectorWritable). When you use NamedVectors
as input vectors, the NamedVectors are stored in clusteredPoints so you can use the names
to identify the classified vectors.

However if you do not create NamedVectors, there appears to be no way to identify the classified
VectorWritables in clusteredPoints? Unless I missed something there is no way to tie the classified
vectors to input objects (docs in my case).

Do I need to create my own classification to get the row ids associated with clusters?
View raw message