mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rahul Mishra <>
Subject Clustering large files using hadoop?
Date Wed, 19 Sep 2012 06:17:14 GMT
I have been able to cluster and generate results for small csv files(having
only continuous values) on a local system using eclipse and it works
In the process, I have been able to vectorize the data points  and use the
clustering results of K-means to feed it as the initial centroid to Fuzzy
K-means clustering.

But, in the end I am able to do it only for small files . For files having
2 million rows, it simply shows error out of memory.
But, since Mahout is for large scale machine learning , how do I convert my
code to use the power of map-reduce framework of hadoop.[info: I have
access to a 3-node Cluster having hadoop]
Can anyone suggest a step-by-step procedure?

I have also looked into the clustering chapters of the book "Mahout in
Action" but to my dismay did not find any clue.

Rahul K Mishra,<>

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message