Sorry, i think more commonly if aggregating transpose is to be used, then
cenroid assignments are better be the key of the matrix D (so D:= A) and
aggregating transpose is performed on a matrix (1  D)' (i.e., 1 cbind
D).t so that the first row of result contains counts of cluster points and
we can finish up cluster assignment via
M = (1  D)'
C = M(:,2:) with each row hadamarddivided by first row of counts M(:,1)
(implying GolubVan Loan notations for subblocking)
On Wed, Mar 29, 2017 at 9:02 AM, Dmitriy Lyubimov <dlieu.7@gmail.com> wrote:
> the simplest scheme is to initialize distributed matrix of the shape D :=
> (0  A) where A is your dataset and 0 is a single column indicating current
> centroid assignment and distribute current centroid matrix C via matrix
> broadcast (assuming there are few enough centers).
>
> Then alternatively run cluster assignment within mapBlock() operator on D
> with recomputation of new centroids C afterwards. Recomputation of
> centroids can be done via aggregating transpose.
>
> of course a better scheme includes presketching (kmeans ) and use of a
> triangle inequality during recomputations.
>
> On Wed, Mar 29, 2017 at 8:30 AM, KHATWANI PARTH BHARAT <
> h2016170@pilani.bitspilani.ac.in> wrote:
>
>> Sir,
>> I am trying to write the kmeans clustering algorithm using Mahout Samsara
>> but i am bit confused
>> about how to leverage Distributed Row Matrix for the same. Can anybody
>> help
>> me with same.
>>
>>
>>
>>
>>
>> Thanks
>> Parth Khatwani
>>
>
>
