mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lance Norskog <goks...@gmail.com>
Subject Re: Clustering large files using hadoop?
Date Wed, 19 Sep 2012 09:33:39 GMT
If you have your Hadoop cluster in your environment variables, most Mahout jobs use the cluster
by default. So, if you can run 'hadoop fs' and look at your hdfs cluster, Mahout should find
your Hadoop cluster.

Lance

----- Original Message -----
| From: "Paritosh Ranjan" <pranjan@xebia.com>
| To: user@mahout.apache.org
| Sent: Tuesday, September 18, 2012 11:28:28 PM
| Subject: Re: Clustering large files using hadoop?
| 
| KMeansDriver has a run method with a flag runSequential. When you
| will
| mark it to false, it will use the hadoop cluster to scale. kmeans
| command is also having this flag.
| 
| "
| 
| In the process, I have been able to vectorize the data points  and
| use the
| clustering results of K-means to feed it as the initial centroid to
| Fuzzy
| K-means clustering.
| 
| "
| You can also use Canopy clustering for initial seeding, as its a
| single
| iteration clustering algorithm and produces good results if proper
| t1,t2
| values are provided.
| https://cwiki.apache.org/confluence/display/MAHOUT/Canopy+Clustering
| 
| 
| On 19-09-2012 11:47, Rahul Mishra wrote:
| > I have been able to cluster and generate results for small csv
| > files(having
| > only continuous values) on a local system using eclipse and it
| > works
| > smoothly.
| > In the process, I have been able to vectorize the data points  and
| > use the
| > clustering results of K-means to feed it as the initial centroid to
| > Fuzzy
| > K-means clustering.
| >
| > But, in the end I am able to do it only for small files . For files
| > having
| > 2 million rows, it simply shows error out of memory.
| > But, since Mahout is for large scale machine learning , how do I
| > convert my
| > code to use the power of map-reduce framework of hadoop.[info: I
| > have
| > access to a 3-node Cluster having hadoop]
| > Can anyone suggest a step-by-step procedure?
| >
| > I have also looked into the clustering chapters of the book "Mahout
| > in
| > Action" but to my dismay did not find any clue.
| >
| 
| 
| 

Mime
View raw message