mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abramov Pavel <>
Subject Compute all distances between Points and KMeans Clusters. VecDist failed
Date Wed, 05 Sep 2012 10:17:21 GMT

After running Kmeans and computing Cluster Centroids I need a matrix (size: <Points>
X <Clusters>) with distances between points and Clusters. By default mahout assigns
every Point to one Cluster. In my case I need a Distance to each cluster.

For example, I have a document related to Formula 1. I have 2 clusters: Sport and Auto. Mahout
assigns document to Cluster AUTO, while I need a result like this:
Distance to AUTO 0.6
Distance to SPORT 0.4

I tried "vecdist" but it fails  with exception ((

Many thanks in advance!

Here is "vecdist" example:
 ./mahout vecdist \
 --input /tmp/tfidf-vectors/ \
 --seeds /tmp/clustering_results_kmeans/clusters-7-final \
 --output /tmp/adhoc/ \
 --distanceMeasure org.apache.mahout.common.distance.CosineDistanceMeasure \
 --overwrite \
 --outType v

And here is stack trace:
12/09/05 14:10:28 INFO mapred.JobClient: Task Id : attempt_201207250104_40622_m_000029_0,
Status : FAILED
java.lang.IllegalStateException: Bad value class: class org.apache.mahout.clustering.iterator.ClusterWritable
at org.apache.mahout.math.hadoop.similarity.SeedVectorUtil.loadSeedVectors(
at org.apache.mahout.math.hadoop.similarity.VectorDistanceInvertedMapper.setup(
at org.apache.hadoop.mapred.MapTask.runNewMapper(
at org.apache.hadoop.mapred.Child$
at Method)
at org.apache.hadoop.mapred.Child.main(

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message