the simplest scheme is to initialize distributed matrix of the shape D :=
(0 | A) where A is your dataset and 0 is a single column indicating current
centroid assignment and distribute current centroid matrix C via matrix
broadcast (assuming there are few enough centers).
Then alternatively run cluster assignment within mapBlock() operator on D
with recomputation of new centroids C afterwards. Recomputation of
centroids can be done via aggregating transpose.
of course a better scheme includes pre-sketching (k-means ||) and use of a
triangle inequality during recomputations.
On Wed, Mar 29, 2017 at 8:30 AM, KHATWANI PARTH BHARAT <
h2016170@pilani.bits-pilani.ac.in> wrote:
> Sir,
> I am trying to write the kmeans clustering algorithm using Mahout Samsara
> but i am bit confused
> about how to leverage Distributed Row Matrix for the same. Can anybody help
> me with same.
>
>
>
>
>
> Thanks
> Parth Khatwani
>