On Sun, Sep 2, 2012 at 12:26 AM, Ahmed Elgohary <aagohary@gmail.com> wrote:
>  I am using k = 30 and p = 2 so (k+p)<99 (Rank(A))
>  I am attaching the csv file of the matrix A
>
Brilliant. And the attachment actually made it through.
>  yes, the difference is significant. Here is the output of the sequential
> SSVD:
>
Indeed. The difference in the sequential case looks very big.
Is it possible that you have nearly identical singular values? This could
leave the singular vectors underdetermined.
I just looked at this using R. Since it uses LAPACK just like Matlab, the
singular vectors I got are very similar to yours.
> s$u[1,1:10]
[1] 0.09866006 0.13195226 0.16620985 0.04921311 0.18280221
0.11559161 0.06775417 0.05038383
[9] 0.01597058 0.03498753
But look at the spectrum:
> s$d[1:30]
[1] 49.944662 5.396441 5.363498 5.162772 5.124997 5.031488 4.952310
4.837590 4.789828 4.718988
[11] 4.673337 4.576722 4.489299 4.464587 4.423230 4.349272 4.233963
4.069206 4.061468 4.028808
[21] 3.977288 3.927101 3.867809 3.771824 3.715605 3.646800 3.522127
3.494618 3.446958 3.412861
After the first singular value, you have lots of nearly identical values.
This indicates that you probably need to use a power method to
precondition the data.
In a largescale application, it is typically a much longer ways before you
hit this flat plateau. This makes the SSVD much more economical.
Also, if I step through the basic algorithm using R, it is easy to see that
the basic assumptions of the stochastic projection are violated.
> Y = A %*% matrix(rnorm(3200), 100, 32)
> qr1 = qr(Y)
> norm(qr.Q(qr1) %*% t(qr.Q(qr1)) %*% A  A)
[1] 20.83509
> norm(A)
[1] 58.34794
The issue here is that the random projection is not a very good
approximation of A. THis happens if the singular value spectrum is still
large at the cutoff:
> s$d[30:99]
[1] 3.41286101 3.38844993 3.32769425 3.27354388 3.20851301 3.16175529
3.08556715 3.00915717 2.95061044
[10] 2.92376037 2.87917470 2.82316676 2.75040129 2.66700067 2.61057954
2.58533645 2.53817660 2.49961198
[19] 2.45114845 2.37234167 2.32895166 2.28347210 2.25993598 2.25411242
2.20475212 2.14491854 2.06434148
[28] 2.05223478 1.98081081 1.94691237 1.92036956 1.83571526 1.75956488
1.69612907 1.67369404 1.58851679
[37] 1.54144415 1.51821081 1.45912647 1.45063011 1.41732925 1.35655483
1.31333389 1.27988382 1.18840667
[46] 1.14900347 1.10145851 1.08105797 1.04054897 0.96239289 0.94764227
0.86479263 0.82724565 0.79683667
[55] 0.71886437 0.68518593 0.63232310 0.57656816 0.57076929 0.51649318
0.43080840 0.39319166 0.37404507
[64] 0.34126503 0.31046386 0.25922183 0.17239248 0.10925979 0.07458005
0.02792311
So this is the basic problem.
If we do even just one or two power iterations things look vats better
> norm(qr.Q(qr1) %*% t(qr.Q(qr1)) %*% A  A)/norm(A)
[1] 0.3570835
> A1 = A %*% t(A) %*% A
> Y1 = A1 %*% matrix(rnorm(3200), 100, 32)
> qr1.1 = qr(A1)
> norm(qr.Q(qr1.1) %*% t(qr.Q(qr1.1)) %*% A1  A1)/norm(A1)
[1] 6.076296e16
> qr1.1 = qr(Y1)
> norm(qr.Q(qr1.1) %*% t(qr.Q(qr1.1)) %*% A1  A1)/norm(A1)
[1] 0.00156115
> A2 = A %*% t(A) %*% A %*% t(A) %*% A
> Y2 = A1 %*% matrix(rnorm(3200), 100, 32)
> Y2 = A2 %*% matrix(rnorm(3200), 100, 32)
> qr1.2 = qr(Y2)
> norm(qr.Q(qr1.2) %*% t(qr.Q(qr1.2)) %*% A2  A2)/norm(A2)
[1] 8.318334e06
But this is just going to help find the larger singular vectors... the
reconstruction of A is still going to be problematic.
For a more interesting test of the algorithm within the domain it is liable
to work, I would recommend starting with a definition of A that has known
singular values. Pick U and V at random, engineer S as desired and then
try this again.
And remember that SSVD isn't intended for small problems.
> I am not sure what you mean:
> "Did you account for the fact that your matrix is small enough that it
> probably wasn't divided correctly?"
>
I was worried that data might not get well split between mappers. That
isn't your problem.
