mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christoph Brücke <christoph.brue...@campus.tu-berlin.de>
Subject Re: Canopy Generation
Date Mon, 27 Jun 2011 09:12:28 GMT
Hi,

usually, regarding the input data, there should be more than just one cluster. You may use
the cluster dumper utility to output the cluster data.  (https://cwiki.apache.org/confluence/display/MAHOUT/Cluster+Dumper)


It seems that your t1 and t2 threshold for the canopies are chosen to large, so that all data
points are assigned to just one canopy. Could you describe your input data (number of dimensions,
range, distribution, ...) and give the parameters you used for the clustering?

Regards,
Christoph

Am 27.06.2011 um 00:40 schrieb Mark:

> Is there an easy way to know hot many canopies where generated after running the canopy
generation tool?
> 
> I tried viewing the file clusters-0/part-r-00000 via seqdumper but it always returns:
> 
> Key: C-0: Value: C-0: {437:0.005630003188145648,478:0.006034746778989781,591:0.020761514762446885...
> Count: 1
> 
> Should there be multiple key value pairs or just this one?
> 
> Thanks
> 
> 



Mime
View raw message