spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From rezazadeh <>
Subject [GitHub] incubator-spark pull request: Principal Component Analysis
Date Sat, 08 Feb 2014 21:01:45 GMT
GitHub user rezazadeh opened a pull request:

    Principal Component Analysis

    # Principal Component Analysis
    Computes the top k principal component coefficients for the m-by-n data matrix X. Rows
of X correspond to observations and columns correspond to variables. The coefficient matrix
is n-by-k. Each column of coeff contains coefficients for one principal component, and the
columns are in descending
    order of component variance. This function centers the data and uses the singular value
decomposition (SVD) algorithm.
    # Testing
    Tests included:
     * All principal components
     * Only top k principal components
    # Documentation
    # Example Usage 
    import org.apache.spark.SparkContext
    import org.apache.spark.mllib.linalg.PCA
    import org.apache.spark.mllib.linalg.SparseMatrix
    import org.apache.spark.mllib.linalg.MatrixEntry
    // Load and parse the data file
    val data = sc.textFile("mllib/data/als/").map { line =>
      val parts = line.split(',')
      MatrixEntry(parts(0).toInt, parts(1).toInt, parts(2).toDouble)
    val m = 4
    val n = 4
    val k = 1
    // recover top principal component
    val coeffs = PCA.computePCA(SparseMatrix(data, m, n), k)
    {% endhighlight %}

You can merge this pull request into a Git repository by running:

    $ git pull pca

Alternatively you can review and apply these changes as the patch at:

commit 0642afb2ec1ca6896ffd1a4d3b12eca3f4db52b3
Author: Reza Zadeh <>
Date:   2014-02-02T05:53:33Z

    Initial files

commit 371f40ae288d45986c364adcfe4b584a9b00aa3d
Author: Reza Zadeh <>
Date:   2014-02-08T01:50:59Z

    new interfaces

commit 173148288dffe6cfa1d6671fa8dd9c57499fd0e8
Author: Reza Zadeh <>
Date:   2014-02-08T04:04:46Z

    add option to compute U

commit fb022fcc857bc3bbbb793882587480671b3e0b23
Author: Reza Zadeh <>
Date:   2014-02-08T08:48:24Z

    new tests, SVD interface

commit f756aff7b322504f09236f3ad4e05d4b75e8cc42
Author: Reza Zadeh <>
Date:   2014-02-08T08:49:47Z

    fix tests

commit 2d831f8f734ddf207707b721aa9718ebd7e65ca9
Author: Reza Zadeh <>
Date:   2014-02-08T09:04:48Z

    Documentation, yo

commit 31a5ecf977e6e4e6cd4d038aaa9f3d1ad1b3de49
Author: Reza Zadeh <>
Date:   2014-02-08T09:15:23Z

    added mllib guide docs

commit 57fe6d4ed9e214a504dbb2c5c66205045d5846b5
Author: Reza Zadeh <>
Date:   2014-02-08T09:18:07Z

    SparkPCA example

commit 07657476d3be2bd177090aaa37f6a4357329a188
Author: Reza Zadeh <>
Date:   2014-02-08T09:22:15Z

    fix typo

commit b45c1e88cb36ce2e5c78f493b05455f87ecfc662
Author: Reza Zadeh <>
Date:   2014-02-08T09:23:15Z

    fix example


View raw message