mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Brickley <dan...@danbri.org>
Subject Re: SSVD Wrong Singular Vectors
Date Sun, 02 Sep 2012 10:37:14 GMT
On Sunday, 2 September 2012 <x-apple-data-detectors://1>, Dmitriy Lyubimov
wrote:

> I'll take a look although it may take me a while to find time.
>
> I have SSVD flow with power iterations in R and i collate results from
> that and java version. I don't immediately have a code to convert form
> csv/R to Mahout (only in reverse direction)


Is this from Danny Bickson useful here?

http://bickson.blogspot.com/2011/02/mahout-svd-matrix-factorization.html
Dan

(pls excuse formatting; using webmail on a phone)


which is a shame, i
> should've made more progress on R and Mahout integration. Which is why
> it will take me time.
>
> Indeed, SSVD is for large matrices and we did accuracy comparisons on
> large data sets (e.g. wikipedia). LLNL doesn't work as gppd on small
> matrices.
>
> However if i find that uniform 0-mean distribution is problematic for
> small problems, i may want to do a patch to address that.
>
> Also the fact that your data in the tale has a flat spectrum adds to
> error a lot. I.e. what you have is one principal direction and a lot
> of random noise around it, and like i said before, random data is not
> going to produce good results beyond that one principal direction.
> Detecting a trend in what is mostly noise is not working well with
> this method esp. in conjunction with so few samples.
>
> but there is still a concern in a sense that power iterations
> should've helped more than they did. I'll take a closer look but it
> will take me a while to figure if there's something we can improve
> here.
>
> One thing is the way to read U matrix: strictly speaking, rows of U
> matrix are not necessarily coming in the same order as rows of A in
> the final output. But they are keyed by the same keys (so it is
> possible that what you thing is U[1,] is actually something else). But
> it will take me some time to verify that.
>
>
> On Sat, Sep 1, 2012 at 9:26 PM, Ahmed Elgohary <aagohary@gmail.com> wrote:
> > - I am using k = 30 and p = 2 so (k+p)<99 (Rank(A))
> > - I am attaching the csv file of the matrix A
> > - yes, the difference is significant. Here is the output of the
> sequential
> > SSVD:
> > u(1,1:10):
> > -0.0987    0.1334    0.1676   -0.0251   -0.2201   -0.0629   -0.0601
> > -0.0575    0.0079    0.0519
> >
> > and the output of matlab's svd:
> > u(1,1:10):
> >   -0.0987   -0.1320    0.1662    0.0492   -0.1828    0.1156   -0.0678
> > 0.0504   -0.0160    0.0350
> >
> > and the output of mahout's SSVD:
> > u(1,1:10):
> >   0.0962    0.1924   -0.2125    0.1668   -0.0188    0.0867   -0.0908
> > 0.0264   -0.0443    0.0207
> >
> >
> > - The code I am using is like this:
> > //convert A.csv to a sequencefile /data/A
> > SSVDSolver ssvdSolver = new SSVDSolver(new Configuration(), new Path[] {
> new
> > Path("/data/A") }, new Path("/ssvd/output"), 1000, 30, 2, 1);
> > ssvdSolver.setQ(1);
> > ssvdSolver.run();
> > //convert ssvdSolver.getUPath() to a csv file
> >
> > I am not sure what you mean:
> >
> > "Did you account for the fact that your matrix is small enough that it
> > probably wasn't divided correctly?"
> >
> > --ahmed
> >
> >
> > On Sat, Sep 1, 2012 at 10:52 AM, Dmitriy Lyubimov <dlieu.7@gmail.com>
> wrote:
> >>
> >> No its zero mean uniform of course. A murmur scaled to -1...1 range.
> >>
> >> I used to use normal too but you advised there were not much difference
> >> and
> >> i actually did not see much either.
> >>
> >> I also think that in this case me moving the input to R via decimals
> >> actually created precision errors too. I will double check. And my
> >> synthetic test input has a flat tale in the lower singular numbers which
> >> of
> >> course messes up some singular vectors in the tale but doesnt affect
> >> singular values. I will check for these things and look again. But i
> dont
> >> see a fundamental problems with the resuls i see, they are the same down
> >> to
> >> eighth digit after the dot, so there is no fundamental problem here.
> >>  On Sep 1, 2012 1:03 AM, "Ted Dunning" <ted.dunning@gmail.com> wrote:
> >>
> >> > Oho...
> >> >
> >> > If the uniform randoms have non-zero means, then this could be a
> >> > significant effect that leads to some loss of significance in the
> >> > results.
> >> >  For small matrices the resulting difference shouldn't be huge but it
> >> > might
> >> > well be observable.
> >> >
> >> > On Sat, Sep 1, 2012 at 3:45 AM, Dmitriy Lyubimov <dlieu.7@gmail.com>
> >> > wrote:
> >> >
> >> > > sorry, i meant  "random trinary"
> >> > >
> >> > > On Sat, Sep 1, 2012 at 12:39 AM, Dmitriy Lyubimov <
> dlieu.7@gmail.com>
> >> > > wrote:
> >> > > > Hm. there is slight error between R full rank SVD and Mahout
MR
> SSVD
> >> > > > for my unit test modified for 100x100 k= 3 p=10.
> >> > > >
> >> > > > First left vector (R/SSVD) :
> >> > > >> s$u[,1]
> >> > > >   [1] -0.050741660 -0.083985411  0.078767108 -0.044487425
> >> > > > -0.010380367
> >> > > >   [6]  0.069635451  0.158337400  0.029102044 -0.168156173
> >> > > > -0.127921554
> >> > > >  [11]  0.012698809 -0.027140724  0.069357925 -0.015605283
> >> > > > 0.076614201
> >> > > >  [16] -0.158582188  0.143656275  0.033886221 -0.055111330
> >> > > > -0.029299261
> >> > > >  [21]  0.059667350  0.039205405  0.042027376  0.048541162
> >> > > > 0.158267382
> >> > > >  [

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message