spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xiangrui Meng <men...@gmail.com>
Subject Re: Discrepancy in PCA values
Date Thu, 08 Jan 2015 20:41:25 GMT
The Julia code is computing the SVD of the Gram matrix. PCA should be
applied to the covariance matrix. -Xiangrui

On Thu, Jan 8, 2015 at 8:27 AM, Upul Bandara <upulbandara@gmail.com> wrote:
> Hi All,
>
> I tried to do PCA for the Iris dataset
> [https://archive.ics.uci.edu/ml/datasets/Iris] using MLLib
> [http://spark.apache.org/docs/1.1.1/mllib-dimensionality-reduction.html].
> Also, PCA  was calculated in Julia using following method:
>
> Sigma = (1/numRow(X))*X'*X ;
> [U, S, V] = svd(Sigma);
> Ureduced = U(:, 1:k);
> Z = X*Ureduced;
>
> However, I'm seeing a little difference between values given by MLLib and
> the method shown above .
>
> Does anyone have any idea about this difference?
>
> Additionally, I have attached two visualizations, related to two approaches.
>
> Thanks,
> Upul
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message