A little albeit perhaps rough error study, while we are on the topic:
For completeness, i also ran comparison of sequential svd results
(ssvd.R prototype) on Ahmed's data (this uses normal sampler) and the
MR version.
I actually took time and added option of 0mean uniform to sequential
R ssvd prototype (so it is more or less identical in stochastic part).
Then i ran R version 50 times on Ahmet's data and actually took time
to run 24 times the MR ssvd on the same data. They all had parameters
k=30,p=15,q=1.
(the caveat is that i actually had to hack SSVDSolver because there
seemed to have been an edit to it and its random generator seems to be
seeded with the same seed in unit tests, not sure why. I will post the
question to developer who made edit on the dev forum as to how and why
it works this way).
Then I computed means of absolute errors in U[1,] in each case, i.e.
errors in first items of first 30 singular vectors, since this seems
to be the subject of discussion.
Then i subtracted error means between two methods (+ sign means
smaller error for MR version, sign means smaller error for R
sequential version):
2.490580e06 6.065328e03 1.954353e03 1.557188e03 3.680165e03
9.346332e04 6.052074e04 2.622627e03 6.930656e04 2.608697e03
1.218882e03 1.042435e03 1.473642e03 8.322338e03 6.035364e03
6.508083e03 6.390501e03 8.441491e03 1.169098e02 5.618290e04
1.061915e02 7.908856e03 2.295738e03 2.397220e04 9.223938e03
3.254358e03 4.996947e03 5.478588e04 4.909232e03 3.874268e03
95% confidence interval for the error mean measurements of MR version is, +/:
5.754773e06 9.002358e03 9.890168e03 1.057098e02 7.995073e03 9.310558e03
1.002221e02 5.687491e03 7.069399e03 8.315481e03 9.091700e03 1.286121e02
1.265919e02 1.859940e02 1.523656e02 1.398035e02 9.846969e03 3.852648e02
1.943268e02 2.668289e02 2.574338e02 2.376454e02 1.330533e02 1.425717e02
2.280331e02 2.086706e02 2.326773e02 2.981179e02 2.949766e02 2.059868e02
which seems to be consistently greater than difference in the errors.
Also, we see intermittent pattern of one method taking precision edge
over another.
This basically tells me there's no statistically difference in
precision between sequential and MR version errors with the same
parameters.
d
On Mon, Sep 3, 2012 at 12:23 PM, Dmitriy Lyubimov <dlieu.7@gmail.com> wrote:
> ok so i ran Ahmed's numbers with optimal, R sequential version (ssvd.R
> on the wiki) and actual MR. Bottom line, both R ssvd and MR are
> producing quite a bit of rubbish for these k,p parameters and the long
> tailed spectrum beyond just 1st singular vector. Obviously it is hard
> to assert the errors since i'd need a lot of runs to compare and each
> run creates different output.
>
> here we go (1st digits of first 4 singular vectors respectively)
>
> optimal
> s$u[1,1:4]
> [1] 0.09866006 0.13195226 0.16620985 0.04921311
>
> ssvd.R
> s < ssvd.svd(A,k=30,p=2,qiter=1)
>> ss$u[1,1:4]
> [1] 0.09869577 0.10368010 0.09543850 0.14122410
>
> MR SSVD
>> U[1,1:4]
> V2 V3 V4 V5
> 0.09865305 0.21822886 0.05959496 0.06561257
>
> Of course both versions of SSVD produce exact result with k+p=100
> which means internal flow is correct. So i don't see necessary
> discrepancies in the results between MR and ssvd.R in that fairly
> corner case of things.
>
> So this just means that our choice of k,p and q is just too poor for
> such a flat spectrum and such a small matrix beyond the first singular
> vector. Can we do better? Sure. As mentioned, k=30,p=70 gives us exact
> optimal solution:
>
>> ss < ssvd.svd(A,k=30,p=70,qiter=0)
>> ss$u[1,1:4]
> [1] 0.09866006 0.13195226 0.16620985 0.04921311
>>
>
> So this is a rebuttal on the Ahmed's statement
>> I tried multiple values for the parameters P and Q but, that does not seem to solve
>> the problem.
>
> No, it kinda does as demonstrated.
>
> Next. Can we try something in between to work such a bad case to get a
> better tradeoff between errors and effort on precision side? sure.
>
> First, It doesn't make sense to use p<15. in fact p=15 is the default
> one for MR version. So: if you ran MR version with default parameters,
> it should have been something closer to this:
>
> (I'll just run algorithm several times for a more consistent
> assessment of error mean and variance)
>> ss < ssvd.svd(A,k=30,p=15,qiter=1)
>> ss$u[1,1:4]
> [1] 0.09867691 0.15782129 0.14318662 0.01915917
> [1] 0.09868661 0.16606151 0.14881755 0.06277614
> [1] 0.09865008 0.18475458 0.10799953 0.03643756
> [1] 0.09862877 0.16593100 0.15187904 0.04660871
> ...
> So this default set of parameters is quite an improvement although we
> probably still get enough error to the right hand of 4th singular
> vector.
> (i assume vector sign is not important, it means sign of V is flipped
> as well i guess).
>
>
> Finally, we can further try to improve with more power iterations.
> ss < ssvd.svd(A,k=30,p=15,qiter=2)
> ss$u[1,1:4]
>
>
>> ss < ssvd.svd(A,k=30,p=15,qiter=2)
>> ss$u[1,1:4]
> [1] 0.09866002 0.13100879 0.16691754 0.04754525
> [1] 0.09866011 0.13027399 0.16417364 0.04453192
> [1] 0.09866004 0.13453556 0.16704920 0.04517045
> [1] 0.09866006 0.12905608 0.16230813 0.04353147
>
> Now that adds even more stability in the 4th vector, doesn't it? The
> error mean is demonstrably decreased.
>
> Bottom line points.
>
> First, there's not much point to investigate corner cases  small
> matrices, flat spectrum... You may force SSVD to produce exact
> solution though, but it kind of defeats the purpose, because you can
> as easily run full rank SVD on a matrix in matlab. Your parameters
> chosen still produce acceptable results for the first vector where
> spectrum is trendy but less so where the spectrum is flat.
>
> Second, you can control your errors through oversampling (p) and power
> iterations (q) and you get much flexibility here. You balance
> precision with flops tradeoff.
>
> Third, in real life it doesn't make sense to have high accuracy
> measuring lfat spectrum. Flat spectrum just means you don't have
> trends in those directions, i.e. essentially a random noise. If you
> have random noise, direction of that noise is usually of little
> interest, but because spectrum (i.e. singular values) is measured
> better with the method, you just can state that you have mostly noise
> after certain nth singular vector and chances are in some cases you
> just wouldn't care. And if you do care, you still have a choice to
> work the oversampling and perhaps q iterations.
>
> Fourth, unless k+p=rank(A), each run will produce slightly different
> results and errors so it doesn't make sense to make any error
> comparison just between single runs of the variations. Instead, it
> makes sense to compare error mean and variations on a better number of
> runs.
>
> d
>
> On Sun, Sep 2, 2012 at 12:00 AM, Ted Dunning <ted.dunning@gmail.com> wrote:
>> Did Ahmed even use a power iteration?
>>
>> On Sun, Sep 2, 2012 at 1:35 AM, Dmitriy Lyubimov <dlieu.7@gmail.com> wrote:
>>
>>> but there is still a concern in a sense that power iterations
>>> should've helped more than they did. I'll take a closer look but it
>>> will take me a while to figure if there's something we can improve
>>> here.
>>>
