spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sandy Ryza <sandy.r...@cloudera.com>
Subject Re: Contributing to MLlib: Proposal for Clustering Algorithms
Date Tue, 08 Jul 2014 20:24:37 GMT
Having a common framework for clustering makes sense to me.  While we
should be careful about what algorithms we include, having solid
implementations of minibatch clustering and hierarchical clustering seems
like a worthwhile goal, and we should reuse as much code and APIs as
reasonable.


On Tue, Jul 8, 2014 at 1:19 PM, RJ Nowling <rnowling@gmail.com> wrote:

> Thanks, Hector! Your feedback is useful.
>
> On Tuesday, July 8, 2014, Hector Yee <hector.yee@gmail.com> wrote:
>
> > I would say for bigdata applications the most useful would be
> hierarchical
> > k-means with back tracking and the ability to support k nearest
> centroids.
> >
> >
> > On Tue, Jul 8, 2014 at 10:54 AM, RJ Nowling <rnowling@gmail.com
> > <javascript:;>> wrote:
> >
> > > Hi all,
> > >
> > > MLlib currently has one clustering algorithm implementation, KMeans.
> > > It would benefit from having implementations of other clustering
> > > algorithms such as MiniBatch KMeans, Fuzzy C-Means, Hierarchical
> > > Clustering, and Affinity Propagation.
> > >
> > > I recently submitted a PR [1] for a MiniBatch KMeans implementation,
> > > and I saw an email on this list about interest in implementing Fuzzy
> > > C-Means.
> > >
> > > Based on Sean Owen's review of my MiniBatch KMeans code, it became
> > > apparent that before I implement more clustering algorithms, it would
> > > be useful to hammer out a framework to reduce code duplication and
> > > implement a consistent API.
> > >
> > > I'd like to gauge the interest and goals of the MLlib community:
> > >
> > > 1. Are you interested in having more clustering algorithms available?
> > >
> > > 2. Is the community interested in specifying a common framework?
> > >
> > > Thanks!
> > > RJ
> > >
> > > [1] - https://github.com/apache/spark/pull/1248
> > >
> > >
> > > --
> > > em rnowling@gmail.com <javascript:;>
> > > c 954.496.2314
> > >
> >
> >
> >
> > --
> > Yee Yang Li Hector <http://google.com/+HectorYee>
> > *google.com/+HectorYee <http://google.com/+HectorYee>*
> >
>
>
> --
> em rnowling@gmail.com
> c 954.496.2314
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message