spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From RJ Nowling <rnowl...@gmail.com>
Subject Contributing to MLlib: Proposal for Clustering Algorithms
Date Tue, 08 Jul 2014 17:54:00 GMT
Hi all,

MLlib currently has one clustering algorithm implementation, KMeans.
It would benefit from having implementations of other clustering
algorithms such as MiniBatch KMeans, Fuzzy C-Means, Hierarchical
Clustering, and Affinity Propagation.

I recently submitted a PR [1] for a MiniBatch KMeans implementation,
and I saw an email on this list about interest in implementing Fuzzy
C-Means.

Based on Sean Owen's review of my MiniBatch KMeans code, it became
apparent that before I implement more clustering algorithms, it would
be useful to hammer out a framework to reduce code duplication and
implement a consistent API.

I'd like to gauge the interest and goals of the MLlib community:

1. Are you interested in having more clustering algorithms available?

2. Is the community interested in specifying a common framework?

Thanks!
RJ

[1] - https://github.com/apache/spark/pull/1248


-- 
em rnowling@gmail.com
c 954.496.2314

Mime
View raw message