mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adil Aijaz <>
Subject Re: using directory input with kmeans clusterer
Date Wed, 29 Jul 2009 20:37:53 GMT
You need to extend RandomSeedGenerator to take in a directory instead of 
a file. Shouldn't have to make significant changes to KMeansDriver. I 
have made the changes already (plus quite a few other things that I 
would like to contribute) but I am currently stuck in getting clearance 
from my company's Open Source Working Group =(


Wei Dong wrote:
> Hi All,
> I've successfully clustered sequence files with KMeansDriver, but I 
> haven't been able to pass directories of sequence files as input.  I 
> have a huge dataset (~4TB) stored in about 8000 parts and it will cost 
> a lot of space simply to merge them into a single file.  Do I need to 
> implement my own KMeansDriver?
> Thanks a lot,
> - Wei Dong

View raw message