mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Eastman <>
Subject Re: Clustering Demo
Date Sat, 17 May 2008 22:35:24 GMT
Grant Ingersoll wrote:
> Anyone have any sample code or demo of running the clustering over a 
> large collection of documents that they could share?  Mainly looking 
> for an example of taking some corpus, converting it into the 
> appropriate Mahout representation and then running either the k-means 
> or the canopy clustering on it.
> Thanks,
> Grant
I've been experimenting with Hadoop deployments on EC2 and have managed 
deploy a single node cluster using an AMI I built from the latest trunk 
version (0.18.0). I'm waiting for 0.17.0 to be released since it has 
much nicer DNS support than (0.16.x) for deploying EC2 clusters. At that 
point there should be a public 0.17.0 AMI  that we all can use. I could 
probably hack the scripts to make mine work but this is a little out of 
my comfort zone and 17 is imminent.

If we can identify some datasets that can be easily downloaded I will 
put copies in S3 so that they can be easily copied into the cloud once 
that is ready. I've run canopy over some Apache logs in my previous life 
but the kinds of datasets under discussion sound much more interesting.


  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message