The dense random matrices should do somewhat better than the kind of sparse
trinary matrix.
On Mon, Sep 3, 2012 at 3:48 PM, Dmitriy Lyubimov wrote:
> There is a question still i guess that we never looked at which is the
> effect of different kind of distribution samplers for the Omega matrix
> (0-mean uniform vs. normal vs. trinary) on small problems though,
> although small matrices is not what Mahout is about.
>
> On Mon, Sep 3, 2012 at 12:23 PM, Dmitriy Lyubimov
> wrote:
> > ok so i ran Ahmed's numbers with optimal, R sequential version (ssvd.R
> > on the wiki) and actual MR. Bottom line, both R ssvd and MR are
> > producing quite a bit of rubbish for these k,p parameters and the long
> > tailed spectrum beyond just 1st singular vector. Obviously it is hard
> > to assert the errors since i'd need a lot of runs to compare and each
> > run creates different output.
> >
> > here we go (1st digits of first 4 singular vectors respectively)
> >
> > optimal
> > s$u[1,1:4]
> > [1] -0.09866006 -0.13195226 -0.16620985 -0.04921311
> >
> > ssvd.R
> > s <- ssvd.svd(A,k=30,p=2,qiter=1)
> >> ss$u[1,1:4]
> > [1] -0.09869577 -0.10368010 0.09543850 0.14122410
> >
> > MR SSVD
> >> U[1,1:4]
> > V2 V3 V4 V5
> > -0.09865305 -0.21822886 0.05959496 0.06561257
> >
> > Of course both versions of SSVD produce exact result with k+p=100
> > which means internal flow is correct. So i don't see necessary
> > discrepancies in the results between MR and ssvd.R in that fairly
> > corner case of things.
> >
> > So this just means that our choice of k,p and q is just too poor for
> > such a flat spectrum and such a small matrix beyond the first singular
> > vector. Can we do better? Sure. As mentioned, k=30,p=70 gives us exact
> > optimal solution:
> >
> >> ss <- ssvd.svd(A,k=30,p=70,qiter=0)
> >> ss$u[1,1:4]
> > [1] -0.09866006 0.13195226 0.16620985 -0.04921311
> >>
> >
> > So this is a rebuttal on the Ahmed's statement
> >> I tried multiple values for the parameters P and Q but, that does not
> seem to solve
> >> the problem.
> >
> > No, it kinda does as demonstrated.
> >
> > Next. Can we try something in between to work such a bad case to get a
> > better trade-off between errors and effort on precision side? sure.
> >
> > First, It doesn't make sense to use p<15. in fact p=15 is the default
> > one for MR version. So: if you ran MR version with default parameters,
> > it should have been something closer to this:
> >
> > (I'll just run algorithm several times for a more consistent
> > assessment of error mean and variance)
> >> ss <- ssvd.svd(A,k=30,p=15,qiter=1)
> >> ss$u[1,1:4]
> > [1] -0.09867691 -0.15782129 -0.14318662 -0.01915917
> > [1] -0.09868661 -0.16606151 -0.14881755 0.06277614
> > [1] -0.09865008 -0.18475458 0.10799953 0.03643756
> > [1] -0.09862877 0.16593100 0.15187904 -0.04660871
> > ...
> > So this default set of parameters is quite an improvement although we
> > probably still get enough error to the right hand of 4th singular
> > vector.
> > (i assume vector sign is not important, it means sign of V is flipped
> > as well i guess).
> >
> >
> > Finally, we can further try to improve with more power iterations.
> > ss <- ssvd.svd(A,k=30,p=15,qiter=2)
> > ss$u[1,1:4]
> >
> >
> >> ss <- ssvd.svd(A,k=30,p=15,qiter=2)
> >> ss$u[1,1:4]
> > [1] -0.09866002 -0.13100879 0.16691754 -0.04754525
> > [1] -0.09866011 -0.13027399 -0.16417364 -0.04453192
> > [1] -0.09866004 -0.13453556 -0.16704920 -0.04517045
> > [1] -0.09866006 -0.12905608 0.16230813 -0.04353147
> >
> > Now that adds even more stability in the 4th vector, doesn't it? The
> > error mean is demonstrably decreased.
> >
> > Bottom line points.
> >
> > First, there's not much point to investigate corner cases -- small
> > matrices, flat spectrum... You may force SSVD to produce exact
> > solution though, but it kind of defeats the purpose, because you can
> > as easily run full rank SVD on a matrix in matlab. Your parameters
> > chosen still produce acceptable results for the first vector where
> > spectrum is trendy but less so where the spectrum is flat.
> >
> > Second, you can control your errors through oversampling (p) and power
> > iterations (q) and you get much flexibility here. You balance
> > precision with flops tradeoff.
> >
> > Third, in real life it doesn't make sense to have high accuracy
> > measuring lfat spectrum. Flat spectrum just means you don't have
> > trends in those directions, i.e. essentially a random noise. If you
> > have random noise, direction of that noise is usually of little
> > interest, but because spectrum (i.e. singular values) is measured
> > better with the method, you just can state that you have mostly noise
> > after certain n-th singular vector and chances are in some cases you
> > just wouldn't care. And if you do care, you still have a choice to
> > work the oversampling and perhaps q iterations.
> >
> > Fourth, unless k+p=rank(A), each run will produce slightly different
> > results and errors so it doesn't make sense to make any error
> > comparison just between single runs of the variations. Instead, it
> > makes sense to compare error mean and variations on a better number of
> > runs.
> >
> > -d
> >
> > On Sun, Sep 2, 2012 at 12:00 AM, Ted Dunning
> wrote:
> >> Did Ahmed even use a power iteration?
> >>
> >> On Sun, Sep 2, 2012 at 1:35 AM, Dmitriy Lyubimov
> wrote:
> >>
> >>> but there is still a concern in a sense that power iterations
> >>> should've helped more than they did. I'll take a closer look but it
> >>> will take me a while to figure if there's something we can improve
> >>> here.
> >>>
>