mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abramov Pavel <p.abra...@rambler-co.ru>
Subject Compute all distances between Points and KMeans Clusters. VecDist failed
Date Wed, 05 Sep 2012 10:17:21 GMT
Hello!

After running Kmeans and computing Cluster Centroids I need a matrix (size: <Points>
X <Clusters>) with distances between points and Clusters. By default mahout assigns
every Point to one Cluster. In my case I need a Distance to each cluster.

For example, I have a document related to Formula 1. I have 2 clusters: Sport and Auto. Mahout
assigns document to Cluster AUTO, while I need a result like this:
Distance to AUTO 0.6
Distance to SPORT 0.4

I tried "vecdist" but it fails  with exception ((

Many thanks in advance!

Here is "vecdist" example:
------------------------------
 ./mahout vecdist \
 --input /tmp/tfidf-vectors/ \
 --seeds /tmp/clustering_results_kmeans/clusters-7-final \
 --output /tmp/adhoc/ \
 --distanceMeasure org.apache.mahout.common.distance.CosineDistanceMeasure \
 --overwrite \
 --outType v
------------------------------

And here is stack trace:
------------------------------
12/09/05 14:10:28 INFO mapred.JobClient: Task Id : attempt_201207250104_40622_m_000029_0,
Status : FAILED
java.lang.IllegalStateException: Bad value class: class org.apache.mahout.clustering.iterator.ClusterWritable
at org.apache.mahout.math.hadoop.similarity.SeedVectorUtil.loadSeedVectors(SeedVectorUtil.java:94)
at org.apache.mahout.math.hadoop.similarity.VectorDistanceInvertedMapper.setup(VectorDistanceInvertedMapper.java:69)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
------------------------------

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message