mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christoph Brücke <christoph.brue...@campus.tu-berlin.de>
Subject Re: KMeans and Canopies
Date Sun, 26 Jun 2011 20:54:39 GMT
Hi Mark,

you typically choose a somewhat cheaper distance metric for the canopy clustering, if used
as a preprocessing step for KMeans. A simple example would be Manhattan distance (d = |x1
- x2| + |y1 - x2|) for Canopy clustering and Squared Euclidean distance [d = sqrt( (x1 - x2)
^2 + (y1 - y2) ^ 2) )] for KMeans. This way you got a cheap approximation for your initial
cluster centers.
I hope this was helpful.

Regard,
Christoph


Am 26.06.2011 um 21:29 schrieb Mark:

> Should canopy generation and KMeans clustering typically use the same distance calculation
or is possible to mix and match? Any reason why some would mix?
> 
> Thanks
> 

Christoph Brücke
christoph.bruecke@campus.tu-berlin.de




Mime
View raw message