spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xiangrui Meng <men...@gmail.com>
Subject Re: Discrepancy in PCA values
Date Sat, 10 Jan 2015 05:47:32 GMT
You need to subtract mean values to obtain the covariance matrix
(http://en.wikipedia.org/wiki/Covariance_matrix).

On Fri, Jan 9, 2015 at 6:41 PM, Upul Bandara <upulbandara@gmail.com> wrote:
> Hi Xiangrui,
>
> Thanks for the reply.
>
> Julia code is also using the covariance matrix:
> (1/n)*X'*X ;
>
> Thanks,
> Upul
>
> On Fri, Jan 9, 2015 at 2:11 AM, Xiangrui Meng <mengxr@gmail.com> wrote:
>>
>> The Julia code is computing the SVD of the Gram matrix. PCA should be
>> applied to the covariance matrix. -Xiangrui
>>
>> On Thu, Jan 8, 2015 at 8:27 AM, Upul Bandara <upulbandara@gmail.com>
>> wrote:
>> > Hi All,
>> >
>> > I tried to do PCA for the Iris dataset
>> > [https://archive.ics.uci.edu/ml/datasets/Iris] using MLLib
>> >
>> > [http://spark.apache.org/docs/1.1.1/mllib-dimensionality-reduction.html].
>> > Also, PCA  was calculated in Julia using following method:
>> >
>> > Sigma = (1/numRow(X))*X'*X ;
>> > [U, S, V] = svd(Sigma);
>> > Ureduced = U(:, 1:k);
>> > Z = X*Ureduced;
>> >
>> > However, I'm seeing a little difference between values given by MLLib
>> > and
>> > the method shown above .
>> >
>> > Does anyone have any idea about this difference?
>> >
>> > Additionally, I have attached two visualizations, related to two
>> > approaches.
>> >
>> > Thanks,
>> > Upul
>> >
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> > For additional commands, e-mail: user-help@spark.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message