mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sebastian Briesemeister <>
Subject Retrieving Fuzzy Cluster Probabilities
Date Fri, 22 Mar 2013 14:39:36 GMT
Dear all,

I am facing troubles when retrieving the cluster probabilities of instances:

I am clustering instances using the FuzzyKMeansDriver.
Afterwards, I am reading instances of WeightedVectorWritable from the
clusteredPoints file (e.g. part-m-0).

When I am clustering in a sequential manner (no map-reduce),  the
weights of the vectors are reasonable probabilities for the clusters.
However, when I am running FuzzyKMeansDriver with sequential=false, the
weight of each vector equals one for EVERY cluster. So the weights do
not even sum up to 1.

Am I doing something wrong here?

I tried to circumvent the problem, by using the FuzzyKMeansClusterer:
After clustering, I retrieved the final clusters (Class Cluster) and
calculated the distance of every instance to each of the cluster
centers. Then I calculated the probabilities for each cluster using the
computeProbWeight method of FuzzyKMeansClusterer.

Interestingly, these probabilities differ from the probabilities I get
from the WeightedVectorWritable instances in the clusteredPoints file
when clustering with sequential=true.

Why is there a difference between the vector weights and the pdfs??

Thank you all in advance,

View raw message