spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GabeChurch <>
Subject Bisecting Kmeans Linkage Matrix Output (Cluster Indices)
Date Wed, 14 Mar 2018 16:07:57 GMT
I have been working on a project to return a Linkage Matrix output from the
Spark Bisecting Kmeans Algorithm output so that it is possible to plot the
selection steps in a dendogram. I am having trouble returning valid Indices
when I use more than 3-4 clusters in the algorithm and am hoping someone
else might have the time/interest enough to take a look. 

To achieve this I made some modifications to the Bisecting Kmeans algorithm
to produce a z-linkage matrix based on yu-iskw's work. I also made some
modifications to provide more information about the selection steps in the
Bisecting Kmeans Algorithm to the log at run-time.

Test outputs using the Iris Dataset with both k = 3 and k = 10 clusters can
be seen on  my stack overflow post

The project so far (with a simple sbt build and the compiled jars) can also
be seen on  my github repo
<>  and is also detailed in
the aforementioned stack overflow post.

Sent from:

To unsubscribe e-mail:

View raw message