Trying to do dimensionality reduction with SSVD then running the new doc matrix through kmeans.
The Lanczos + ClusterDump test of SVD + kmeans uses Ahat = A^t V^t. Unfortunately this results
in anonymous vectors in clusteredPoints after Ahat is run through kmeans. The doc ids are
lost due to the transpose I assume?
In any case Dmitriy pointed out that this might have been done because Lanczos does not produce
U.
So I need to do US^1? This would avoid the transpose and should preserve doc/row ids for
kmeans? And doing the PCA in SSVD will weight things properly so I don't need the halfSigma?
Please correct me if I'm wrong.
On Sep 5, 2012, at 4:59 PM, Dmitriy Lyubimov <dlieu.7@gmail.com> wrote:
Yes i have an option to output U * Sigma^0.5 already.
But strictly speaking the way PCA space is defined would require just
U*Sigma output. Or it is not worth it?
On Wed, Sep 5, 2012 at 4:56 PM, Ted Dunning <ted.dunning@gmail.com> wrote:
> Yes. (AM)V is U \Sigma. You may actually want something like U \sqrt
> \Sigma instead, though.
>
>
> On Wed, Sep 5, 2012 at 4:10 PM, Dmitriy Lyubimov <dlieu.7@gmail.com> wrote:
>
>> Hello,
>>
>> I have a question w.r.t what to advise people in the SSVD manual for PCA.
>>
>> So we have
>>
>> (AM) \approx U \Sigma V^t
>>
>> and strictly speaking since svd is reduced rank, we need to reproject
>> original data points as
>>
>> Y= (AM)V
>>
>> However we can assume (AM)V \approx U \Sigma, can't we? I.e. instead of
>> recomputing tough job of (AM)V we can just advise to use U\Sigma or just U
>> in some cases, can't we?
>>
>> Thanks.
>> d
>>
