mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Lyubimov <dlie...@gmail.com>
Subject Re: SSVD Wrong Singular Vectors
Date Mon, 03 Sep 2012 19:23:55 GMT
ok so i ran Ahmed's numbers with optimal, R sequential version (ssvd.R
on the wiki) and actual MR. Bottom line, both R ssvd and MR are
producing quite a bit of rubbish for these k,p parameters and the long
tailed spectrum beyond just 1st singular vector. Obviously it is hard
to assert the errors since i'd need a lot of runs to compare and each
run creates different output.

here we go (1st digits of first 4 singular vectors respectively)

optimal
s$u[1,1:4]
[1] -0.09866006 -0.13195226 -0.16620985 -0.04921311

ssvd.R
s <- ssvd.svd(A,k=30,p=2,qiter=1)
> ss$u[1,1:4]
[1] -0.09869577 -0.10368010  0.09543850  0.14122410

MR SSVD
> U[1,1:4]
         V2          V3          V4          V5
-0.09865305 -0.21822886  0.05959496  0.06561257

Of course both versions of SSVD produce exact result with k+p=100
which means internal flow is correct. So i don't see necessary
discrepancies in the results between MR and ssvd.R in that fairly
corner case of things.

So this just means that our choice of k,p and q is just too poor for
such a flat spectrum and such a small matrix beyond the first singular
vector. Can we do better? Sure. As mentioned, k=30,p=70 gives us exact
optimal solution:

> ss <- ssvd.svd(A,k=30,p=70,qiter=0)
> ss$u[1,1:4]
[1] -0.09866006  0.13195226  0.16620985 -0.04921311
>

So this is a rebuttal on the Ahmed's statement
>  I tried multiple values for the parameters P and Q but, that does not seem to solve
> the problem.

No, it kinda does as demonstrated.

Next. Can we try something in between to work such a bad case to get a
better trade-off between errors and effort on precision side? sure.

First, It doesn't make sense to use p<15. in fact p=15 is the default
one for MR version. So: if you ran MR version with default parameters,
it should have been something closer to this:

(I'll just run algorithm several times for a more consistent
assessment of error mean and variance)
> ss <- ssvd.svd(A,k=30,p=15,qiter=1)
> ss$u[1,1:4]
[1] -0.09867691 -0.15782129 -0.14318662 -0.01915917
[1] -0.09868661 -0.16606151 -0.14881755  0.06277614
[1] -0.09865008 -0.18475458  0.10799953  0.03643756
[1] -0.09862877  0.16593100  0.15187904 -0.04660871
...
So this default set of parameters is quite an improvement although we
probably still get enough error to the right hand of 4th singular
vector.
(i assume vector sign is not important, it means sign of V is flipped
as well i guess).


Finally, we can further try to improve with more power iterations.
ss <- ssvd.svd(A,k=30,p=15,qiter=2)
ss$u[1,1:4]


> ss <- ssvd.svd(A,k=30,p=15,qiter=2)
> ss$u[1,1:4]
[1] -0.09866002 -0.13100879  0.16691754 -0.04754525
[1] -0.09866011 -0.13027399 -0.16417364 -0.04453192
[1] -0.09866004 -0.13453556 -0.16704920 -0.04517045
[1] -0.09866006 -0.12905608  0.16230813 -0.04353147

Now that adds even more stability in the 4th vector, doesn't it? The
error mean is demonstrably decreased.

Bottom line points.

First, there's not much point to investigate corner cases -- small
matrices, flat spectrum... You may force SSVD to produce exact
solution though, but it kind of defeats the purpose, because you can
as easily run full rank SVD on a matrix in matlab. Your parameters
chosen still produce acceptable results for the first vector where
spectrum is trendy but less so  where the spectrum is flat.

Second, you can control your errors through oversampling (p) and power
iterations (q) and you get much flexibility here. You balance
precision with flops tradeoff.

Third, in real life it doesn't make sense to have high accuracy
measuring lfat spectrum. Flat spectrum just means you don't have
trends in those directions, i.e. essentially a random noise. If you
have random noise, direction of that noise is usually of little
interest, but because spectrum (i.e. singular values) is measured
better with the method, you just can state that you have mostly noise
after certain n-th singular vector and chances are in some cases  you
just wouldn't care. And if you do care, you still have a choice to
work the oversampling and perhaps q iterations.

Fourth, unless k+p=rank(A), each run will produce slightly different
results and errors so it doesn't make sense to make any error
comparison just between single runs of the variations. Instead, it
makes sense to compare error mean and variations on a better number of
runs.

-d

On Sun, Sep 2, 2012 at 12:00 AM, Ted Dunning <ted.dunning@gmail.com> wrote:
> Did Ahmed even use a power iteration?
>
> On Sun, Sep 2, 2012 at 1:35 AM, Dmitriy Lyubimov <dlieu.7@gmail.com> wrote:
>
>> but there is still a concern in a sense that power iterations
>> should've helped more than they did. I'll take a closer look but it
>> will take me a while to figure if there's something we can improve
>> here.
>>

Mime
View raw message