spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Upul Bandara <upulband...@gmail.com>
Subject Re: Discrepancy in PCA values
Date Sat, 10 Jan 2015 02:41:13 GMT
Hi Xiangrui,

Thanks for the reply.

Julia code is also using the covariance matrix:
(1/n)*X'*X ;

Thanks,
Upul

On Fri, Jan 9, 2015 at 2:11 AM, Xiangrui Meng <mengxr@gmail.com> wrote:

> The Julia code is computing the SVD of the Gram matrix. PCA should be
> applied to the covariance matrix. -Xiangrui
>
> On Thu, Jan 8, 2015 at 8:27 AM, Upul Bandara <upulbandara@gmail.com>
> wrote:
> > Hi All,
> >
> > I tried to do PCA for the Iris dataset
> > [https://archive.ics.uci.edu/ml/datasets/Iris] using MLLib
> > [http://spark.apache.org/docs/1.1.1/mllib-dimensionality-reduction.html
> ].
> > Also, PCA  was calculated in Julia using following method:
> >
> > Sigma = (1/numRow(X))*X'*X ;
> > [U, S, V] = svd(Sigma);
> > Ureduced = U(:, 1:k);
> > Z = X*Ureduced;
> >
> > However, I'm seeing a little difference between values given by MLLib and
> > the method shown above .
> >
> > Does anyone have any idea about this difference?
> >
> > Additionally, I have attached two visualizations, related to two
> approaches.
> >
> > Thanks,
> > Upul
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> > For additional commands, e-mail: user-help@spark.apache.org
>

Mime
View raw message