mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Viral Parikh <>
Subject Mahout used for Text Clustering
Date Thu, 04 Dec 2014 19:22:33 GMT
Hi Mahout Users!
I am currently working on Text Clustering and I am using Mahout and Clustering algorithms
(kmeans, LDA, canopy etc) for that.
 I have below questions –
1. Why is Mahout giving out clusters with only 1 observation?
2. Is cluster 1 always catch-all cluster?
3. When I change the k in kmeans and do clusterdump, the total number of observations change
as k changes? Why so? Am I missing anything?
4. Does normalization (when creating the vectors) lead to good quality of clustering results,
especially for unstructured data. In my case its text data!

Thank you in advance for your help!

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message