Hi all,
Im doing Kmeans clustering in Mahout using Tanimoto distance measure
My input are feature vectors for which the indexes are the features and the
value is 1 for features that exist in the sample, and 0 for nonexisting
features
(it is actually clustering of users by documents they read, so for each
user we have 1 in the documents that he read)
So the input vectors are only 0 or 1
By the output clusters are double values  not only 0 and 1
and in the kmeans iterations I guess Kmeans move the cluster centers to
various values for all features  not only 0 and 1
So will the Tanimoto distance measure work in this case?
I think it only gives the Jaccard Index when the values are 0 and 1
(else it will not reflect the ratio between intersection and union of the
features in the 2 points)
If I add feature weights even more it will not be only 0 or 1 values given
to the distance measure
So will TanimotoDistanceMeasure really work in KMeans clustering in Hadoop?
See this link for when Tanimoto is really a proper distance measure:
http://en.wikipedia.org/wiki/Jaccard_index
