There is a question still i guess that we never looked at which is the
effect of different kind of distribution samplers for the Omega matrix
(0mean uniform vs. normal vs. trinary) on small problems though,
although small matrices is not what Mahout is about.
On Mon, Sep 3, 2012 at 12:23 PM, Dmitriy Lyubimov <dlieu.7@gmail.com> wrote:
> ok so i ran Ahmed's numbers with optimal, R sequential version (ssvd.R
> on the wiki) and actual MR. Bottom line, both R ssvd and MR are
> producing quite a bit of rubbish for these k,p parameters and the long
> tailed spectrum beyond just 1st singular vector. Obviously it is hard
> to assert the errors since i'd need a lot of runs to compare and each
> run creates different output.
>
> here we go (1st digits of first 4 singular vectors respectively)
>
> optimal
> s$u[1,1:4]
> [1] 0.09866006 0.13195226 0.16620985 0.04921311
>
> ssvd.R
> s < ssvd.svd(A,k=30,p=2,qiter=1)
>> ss$u[1,1:4]
> [1] 0.09869577 0.10368010 0.09543850 0.14122410
>
> MR SSVD
>> U[1,1:4]
> V2 V3 V4 V5
> 0.09865305 0.21822886 0.05959496 0.06561257
>
> Of course both versions of SSVD produce exact result with k+p=100
> which means internal flow is correct. So i don't see necessary
> discrepancies in the results between MR and ssvd.R in that fairly
> corner case of things.
>
> So this just means that our choice of k,p and q is just too poor for
> such a flat spectrum and such a small matrix beyond the first singular
> vector. Can we do better? Sure. As mentioned, k=30,p=70 gives us exact
> optimal solution:
>
>> ss < ssvd.svd(A,k=30,p=70,qiter=0)
>> ss$u[1,1:4]
> [1] 0.09866006 0.13195226 0.16620985 0.04921311
>>
>
> So this is a rebuttal on the Ahmed's statement
>> I tried multiple values for the parameters P and Q but, that does not seem to solve
>> the problem.
>
> No, it kinda does as demonstrated.
>
> Next. Can we try something in between to work such a bad case to get a
> better tradeoff between errors and effort on precision side? sure.
>
> First, It doesn't make sense to use p<15. in fact p=15 is the default
> one for MR version. So: if you ran MR version with default parameters,
> it should have been something closer to this:
>
> (I'll just run algorithm several times for a more consistent
> assessment of error mean and variance)
>> ss < ssvd.svd(A,k=30,p=15,qiter=1)
>> ss$u[1,1:4]
> [1] 0.09867691 0.15782129 0.14318662 0.01915917
> [1] 0.09868661 0.16606151 0.14881755 0.06277614
> [1] 0.09865008 0.18475458 0.10799953 0.03643756
> [1] 0.09862877 0.16593100 0.15187904 0.04660871
> ...
> So this default set of parameters is quite an improvement although we
> probably still get enough error to the right hand of 4th singular
> vector.
> (i assume vector sign is not important, it means sign of V is flipped
> as well i guess).
>
>
> Finally, we can further try to improve with more power iterations.
> ss < ssvd.svd(A,k=30,p=15,qiter=2)
> ss$u[1,1:4]
>
>
>> ss < ssvd.svd(A,k=30,p=15,qiter=2)
>> ss$u[1,1:4]
> [1] 0.09866002 0.13100879 0.16691754 0.04754525
> [1] 0.09866011 0.13027399 0.16417364 0.04453192
> [1] 0.09866004 0.13453556 0.16704920 0.04517045
> [1] 0.09866006 0.12905608 0.16230813 0.04353147
>
> Now that adds even more stability in the 4th vector, doesn't it? The
> error mean is demonstrably decreased.
>
> Bottom line points.
>
> First, there's not much point to investigate corner cases  small
> matrices, flat spectrum... You may force SSVD to produce exact
> solution though, but it kind of defeats the purpose, because you can
> as easily run full rank SVD on a matrix in matlab. Your parameters
> chosen still produce acceptable results for the first vector where
> spectrum is trendy but less so where the spectrum is flat.
>
> Second, you can control your errors through oversampling (p) and power
> iterations (q) and you get much flexibility here. You balance
> precision with flops tradeoff.
>
> Third, in real life it doesn't make sense to have high accuracy
> measuring lfat spectrum. Flat spectrum just means you don't have
> trends in those directions, i.e. essentially a random noise. If you
> have random noise, direction of that noise is usually of little
> interest, but because spectrum (i.e. singular values) is measured
> better with the method, you just can state that you have mostly noise
> after certain nth singular vector and chances are in some cases you
> just wouldn't care. And if you do care, you still have a choice to
> work the oversampling and perhaps q iterations.
>
> Fourth, unless k+p=rank(A), each run will produce slightly different
> results and errors so it doesn't make sense to make any error
> comparison just between single runs of the variations. Instead, it
> makes sense to compare error mean and variations on a better number of
> runs.
>
> d
>
> On Sun, Sep 2, 2012 at 12:00 AM, Ted Dunning <ted.dunning@gmail.com> wrote:
>> Did Ahmed even use a power iteration?
>>
>> On Sun, Sep 2, 2012 at 1:35 AM, Dmitriy Lyubimov <dlieu.7@gmail.com> wrote:
>>
>>> but there is still a concern in a sense that power iterations
>>> should've helped more than they did. I'll take a closer look but it
>>> will take me a while to figure if there's something we can improve
>>> here.
>>>
