mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pat Ferrel <>
Subject kmeans from 0.6 to 0.7
Date Thu, 07 Jun 2012 17:00:36 GMT
It appears that in kmeans the clusteredPoints are now written as 
WeightedVectorWritable where in mahout 0.6 they were 
WeightedPropertyVectorWritable? This means that the distance from the 
centroid is no longer stored here? Why? I hope I'm wrong because that is 
not a welcome change. How is one to order clustered docs by distance 
from cluster centroid?

I'm sure I could calculate the distance but that would mean looking up 
the centroid for the cluster id given in the above 
WeightedVectorWritable, which means iterating through all the clusters 
for each clustered doc. In my case the number of clusters could be 
fairly large.

Am I missing something?

View raw message