# mahout-user mailing list archives

##### Site index · List index
Message view
Top
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: SSVD Wrong Singular Vectors
Date Sun, 02 Sep 2012 05:10:47 GMT
```On Sun, Sep 2, 2012 at 12:26 AM, Ahmed Elgohary <aagohary@gmail.com> wrote:

> - I am using k = 30 and p = 2 so (k+p)<99 (Rank(A))
> - I am attaching the csv file of the matrix A
>

Brilliant.  And the attachment actually made it through.

> - yes, the difference is significant. Here is the output of the sequential
> SSVD:
>

Indeed.  The difference in the sequential case looks very big.

Is it possible that you have nearly identical singular values?  This could
leave the singular vectors under-determined.

I just looked at this using R.  Since it uses LAPACK just like Matlab, the
singular vectors I got are very similar to yours.

> s\$u[1,1:10]
 -0.09866006 -0.13195226  0.16620985  0.04921311 -0.18280221
0.11559161 -0.06775417  0.05038383
 -0.01597058  0.03498753

But look at the spectrum:

> s\$d[1:30]
 49.944662  5.396441  5.363498  5.162772  5.124997  5.031488  4.952310
4.837590  4.789828  4.718988
  4.673337  4.576722  4.489299  4.464587  4.423230  4.349272  4.233963
4.069206  4.061468  4.028808
  3.977288  3.927101  3.867809  3.771824  3.715605  3.646800  3.522127
3.494618  3.446958  3.412861

After the first singular value, you have lots of nearly identical values.
This indicates that you probably need to use a power method to
pre-condition the data.

In a large-scale application, it is typically a much longer ways before you
hit this flat plateau.  This makes the SSVD much more economical.

Also, if I step through the basic algorithm using R, it is easy to see that
the basic assumptions of the stochastic projection are violated.

> Y = A %*% matrix(rnorm(3200), 100, 32)
> qr1 = qr(Y)
> norm(qr.Q(qr1) %*% t(qr.Q(qr1)) %*% A - A)
 20.83509
> norm(A)
 58.34794

The issue here is that the random projection is not a very good
approximation of A.  THis happens if the singular value spectrum is still
large at the cutoff:

> s\$d[30:99]
 3.41286101 3.38844993 3.32769425 3.27354388 3.20851301 3.16175529
3.08556715 3.00915717 2.95061044
 2.92376037 2.87917470 2.82316676 2.75040129 2.66700067 2.61057954
2.58533645 2.53817660 2.49961198
 2.45114845 2.37234167 2.32895166 2.28347210 2.25993598 2.25411242
2.20475212 2.14491854 2.06434148
 2.05223478 1.98081081 1.94691237 1.92036956 1.83571526 1.75956488
1.69612907 1.67369404 1.58851679
 1.54144415 1.51821081 1.45912647 1.45063011 1.41732925 1.35655483
1.31333389 1.27988382 1.18840667
 1.14900347 1.10145851 1.08105797 1.04054897 0.96239289 0.94764227
0.86479263 0.82724565 0.79683667
 0.71886437 0.68518593 0.63232310 0.57656816 0.57076929 0.51649318
0.43080840 0.39319166 0.37404507
 0.34126503 0.31046386 0.25922183 0.17239248 0.10925979 0.07458005
0.02792311

So this is the basic problem.

If we do even just one or two power iterations things look vats better

> norm(qr.Q(qr1) %*% t(qr.Q(qr1)) %*% A - A)/norm(A)
 0.3570835
> A1 = A %*% t(A) %*% A
> Y1 = A1 %*% matrix(rnorm(3200), 100, 32)
> qr1.1 = qr(A1)
> norm(qr.Q(qr1.1) %*% t(qr.Q(qr1.1)) %*% A1 - A1)/norm(A1)
 6.076296e-16
> qr1.1 = qr(Y1)
> norm(qr.Q(qr1.1) %*% t(qr.Q(qr1.1)) %*% A1 - A1)/norm(A1)
 0.00156115
> A2 = A %*% t(A) %*% A %*% t(A) %*% A
> Y2 = A1 %*% matrix(rnorm(3200), 100, 32)
> Y2 = A2 %*% matrix(rnorm(3200), 100, 32)
> qr1.2 = qr(Y2)
> norm(qr.Q(qr1.2) %*% t(qr.Q(qr1.2)) %*% A2 - A2)/norm(A2)
 8.318334e-06

But this is just going to help find the larger singular vectors... the
reconstruction of A is still going to be problematic.

For a more interesting test of the algorithm within the domain it is liable
to work, I would recommend starting with a definition of A that has known
singular values.  Pick U and V at random, engineer S as desired and then
try this again.

And remember that SSVD isn't intended for small problems.

> I am not sure what you mean:
> "Did you account for the fact that your matrix is small enough that it
> probably wasn't divided correctly?"
>

I was worried that data might not get well split between mappers.  That
isn't your problem.

```
Mime
• Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message