spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Musselman (JIRA)" <>
Subject [jira] [Commented] (SPARK-4259) Add Power Iteration Clustering Algorithm with Gaussian Similarity Function
Date Fri, 30 Jan 2015 19:07:34 GMT


Andrew Musselman commented on SPARK-4259:

Makes sense; does that pull request contain a working version?

> Add Power Iteration Clustering Algorithm with Gaussian Similarity Function
> --------------------------------------------------------------------------
>                 Key: SPARK-4259
>                 URL:
>             Project: Spark
>          Issue Type: New Feature
>          Components: MLlib
>            Reporter: Fan Jiang
>            Assignee: Fan Jiang
>              Labels: features
> In recent years, power Iteration clustering has become one of the most popular modern
clustering algorithms. It is simple to implement, can be solved efficiently by standard linear
algebra software, and very often outperforms traditional clustering algorithms such as the
k-means algorithm.
> Power iteration clustering is a scalable and efficient algorithm for clustering points
given pointwise mutual affinity values.  Internally the algorithm:
> computes the Gaussian distance between all pairs of points and represents these distances
in an Affinity Matrix
> calculates a Normalized Affinity Matrix
> calculates the principal eigenvalue and eigenvector
> Clusters each of the input points according to their principal eigenvector component
> Details of this algorithm are found within [Power Iteration Clustering, Lin and Cohen]{}

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message