When using Laczos the recommendation is to use clean eigen vectors as a distributed row matrixcall
it V.
Ahat = A^t V^t this per the clusterdump tests DSVD and DSVD2.
Dmitriy and Ted recommend when using SSVD to do:
Ahat = US
When using PCA it's also preferable to use uHalfSigma to create U with the SSVD solver.
One difficulty is that to perform the multiplication you have to turn the singular values
vector (diagonal values) into a distributed row matrix or write your own multiply function,
correct?
Questions:
For SSVD can someone explain why US is preferred? Given A = USV^t how can you ignore the effect
of V^t? Is this only for PCA? In other words if you did not use PCA weighting would you ignore
V^t?
For Lanczos Ahat = A^t V^t seems to strip doc id during transpose, am I mistaken? Also shouldn't
Ahat be transposed before performing kmeans or other analysis?
> Dmitriy said
With SSVD you need just US (or U*Sigma in other notation).
This is your dimensionally reduced output of your original document
matrix you've run with pca option.
As Ted suggests, you may also use US^0.5 which is already produced by
providing uHalfSigma (or its embedded setter analog). the keys of
that output (produced by getUPath() call) will already contain your
Text document ids as sequence file keys.
d
