mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Lyubimov <>
Subject Re: Trying to write the KMeans Clustering Using "Apache Mahout Samsara"
Date Wed, 29 Mar 2017 16:02:19 GMT
the simplest scheme is to initialize distributed matrix of the shape D :=
(0 | A) where A is your dataset and 0 is a single column indicating current
centroid assignment and distribute current centroid matrix C via matrix
broadcast (assuming there are few enough centers).

Then alternatively run cluster assignment within mapBlock() operator on D
with recomputation of new centroids C afterwards. Recomputation of
centroids can be done via aggregating transpose.

of course a better scheme includes pre-sketching (k-means ||) and use of a
triangle inequality during recomputations.

On Wed, Mar 29, 2017 at 8:30 AM, KHATWANI PARTH BHARAT <> wrote:

> Sir,
> I am trying to write the kmeans clustering algorithm using Mahout Samsara
> but i am bit confused
> about how to leverage Distributed Row Matrix for the same. Can anybody help
> me with same.
> Thanks
> Parth Khatwani

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message