mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rahul Mishra <mishra.rah...@gmail.com>
Subject Clustering large files using hadoop?
Date Wed, 19 Sep 2012 06:17:14 GMT
I have been able to cluster and generate results for small csv files(having
only continuous values) on a local system using eclipse and it works
smoothly.
In the process, I have been able to vectorize the data points  and use the
clustering results of K-means to feed it as the initial centroid to Fuzzy
K-means clustering.

But, in the end I am able to do it only for small files . For files having
2 million rows, it simply shows error out of memory.
But, since Mahout is for large scale machine learning , how do I convert my
code to use the power of map-reduce framework of hadoop.[info: I have
access to a 3-node Cluster having hadoop]
Can anyone suggest a step-by-step procedure?

I have also looked into the clustering chapters of the book "Mahout in
Action" but to my dismay did not find any clue.

-- 
Regards,
Rahul K Mishra,
www.ee.iitb.ac.in/student/~rahulkmishra<http://www.ee.iitb.ac.in/student/%7Erahulkmishra>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message