On Sunday, 2 September 2012 <xappledatadetectors://1>, Dmitriy Lyubimov
wrote:
> I'll take a look although it may take me a while to find time.
>
> I have SSVD flow with power iterations in R and i collate results from
> that and java version. I don't immediately have a code to convert form
> csv/R to Mahout (only in reverse direction)
Is this from Danny Bickson useful here?
http://bickson.blogspot.com/2011/02/mahoutsvdmatrixfactorization.html
Dan
(pls excuse formatting; using webmail on a phone)
which is a shame, i
> should've made more progress on R and Mahout integration. Which is why
> it will take me time.
>
> Indeed, SSVD is for large matrices and we did accuracy comparisons on
> large data sets (e.g. wikipedia). LLNL doesn't work as gppd on small
> matrices.
>
> However if i find that uniform 0mean distribution is problematic for
> small problems, i may want to do a patch to address that.
>
> Also the fact that your data in the tale has a flat spectrum adds to
> error a lot. I.e. what you have is one principal direction and a lot
> of random noise around it, and like i said before, random data is not
> going to produce good results beyond that one principal direction.
> Detecting a trend in what is mostly noise is not working well with
> this method esp. in conjunction with so few samples.
>
> but there is still a concern in a sense that power iterations
> should've helped more than they did. I'll take a closer look but it
> will take me a while to figure if there's something we can improve
> here.
>
> One thing is the way to read U matrix: strictly speaking, rows of U
> matrix are not necessarily coming in the same order as rows of A in
> the final output. But they are keyed by the same keys (so it is
> possible that what you thing is U[1,] is actually something else). But
> it will take me some time to verify that.
>
>
> On Sat, Sep 1, 2012 at 9:26 PM, Ahmed Elgohary <aagohary@gmail.com> wrote:
> >  I am using k = 30 and p = 2 so (k+p)<99 (Rank(A))
> >  I am attaching the csv file of the matrix A
> >  yes, the difference is significant. Here is the output of the
> sequential
> > SSVD:
> > u(1,1:10):
> > 0.0987 0.1334 0.1676 0.0251 0.2201 0.0629 0.0601
> > 0.0575 0.0079 0.0519
> >
> > and the output of matlab's svd:
> > u(1,1:10):
> > 0.0987 0.1320 0.1662 0.0492 0.1828 0.1156 0.0678
> > 0.0504 0.0160 0.0350
> >
> > and the output of mahout's SSVD:
> > u(1,1:10):
> > 0.0962 0.1924 0.2125 0.1668 0.0188 0.0867 0.0908
> > 0.0264 0.0443 0.0207
> >
> >
> >  The code I am using is like this:
> > //convert A.csv to a sequencefile /data/A
> > SSVDSolver ssvdSolver = new SSVDSolver(new Configuration(), new Path[] {
> new
> > Path("/data/A") }, new Path("/ssvd/output"), 1000, 30, 2, 1);
> > ssvdSolver.setQ(1);
> > ssvdSolver.run();
> > //convert ssvdSolver.getUPath() to a csv file
> >
> > I am not sure what you mean:
> >
> > "Did you account for the fact that your matrix is small enough that it
> > probably wasn't divided correctly?"
> >
> > ahmed
> >
> >
> > On Sat, Sep 1, 2012 at 10:52 AM, Dmitriy Lyubimov <dlieu.7@gmail.com>
> wrote:
> >>
> >> No its zero mean uniform of course. A murmur scaled to 1...1 range.
> >>
> >> I used to use normal too but you advised there were not much difference
> >> and
> >> i actually did not see much either.
> >>
> >> I also think that in this case me moving the input to R via decimals
> >> actually created precision errors too. I will double check. And my
> >> synthetic test input has a flat tale in the lower singular numbers which
> >> of
> >> course messes up some singular vectors in the tale but doesnt affect
> >> singular values. I will check for these things and look again. But i
> dont
> >> see a fundamental problems with the resuls i see, they are the same down
> >> to
> >> eighth digit after the dot, so there is no fundamental problem here.
> >> On Sep 1, 2012 1:03 AM, "Ted Dunning" <ted.dunning@gmail.com> wrote:
> >>
> >> > Oho...
> >> >
> >> > If the uniform randoms have nonzero means, then this could be a
> >> > significant effect that leads to some loss of significance in the
> >> > results.
> >> > For small matrices the resulting difference shouldn't be huge but it
> >> > might
> >> > well be observable.
> >> >
> >> > On Sat, Sep 1, 2012 at 3:45 AM, Dmitriy Lyubimov <dlieu.7@gmail.com>
> >> > wrote:
> >> >
> >> > > sorry, i meant "random trinary"
> >> > >
> >> > > On Sat, Sep 1, 2012 at 12:39 AM, Dmitriy Lyubimov <
> dlieu.7@gmail.com>
> >> > > wrote:
> >> > > > Hm. there is slight error between R full rank SVD and Mahout
MR
> SSVD
> >> > > > for my unit test modified for 100x100 k= 3 p=10.
> >> > > >
> >> > > > First left vector (R/SSVD) :
> >> > > >> s$u[,1]
> >> > > > [1] 0.050741660 0.083985411 0.078767108 0.044487425
> >> > > > 0.010380367
> >> > > > [6] 0.069635451 0.158337400 0.029102044 0.168156173
> >> > > > 0.127921554
> >> > > > [11] 0.012698809 0.027140724 0.069357925 0.015605283
> >> > > > 0.076614201
> >> > > > [16] 0.158582188 0.143656275 0.033886221 0.055111330
> >> > > > 0.029299261
> >> > > > [21] 0.059667350 0.039205405 0.042027376 0.048541162
> >> > > > 0.158267382
> >> > > > [
