mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ahmed Elgohary <aagoh...@gmail.com>
Subject Re: SSVD Wrong Singular Vectors
Date Sun, 02 Sep 2012 04:26:55 GMT
- I am using k = 30 and p = 2 so (k+p)<99 (Rank(A))
- I am attaching the csv file of the matrix A
- yes, the difference is significant. Here is the output of the sequential
SSVD:
u(1,1:10):
-0.0987    0.1334    0.1676   -0.0251   -0.2201   -0.0629   -0.0601
-0.0575    0.0079    0.0519

and the output of matlab's svd:
u(1,1:10):
  -0.0987   -0.1320    0.1662    0.0492   -0.1828    0.1156   -0.0678
0.0504   -0.0160    0.0350

and the output of mahout's SSVD:
u(1,1:10):
  0.0962    0.1924   -0.2125    0.1668   -0.0188    0.0867   -0.0908
0.0264   -0.0443    0.0207


- The code I am using is like this:
//convert A.csv to a sequencefile /data/A
SSVDSolver ssvdSolver = new SSVDSolver(new Configuration(), new Path[] {
new Path("/data/A") }, new Path("/ssvd/output"), 1000, 30, 2, 1);
ssvdSolver.setQ(1);
ssvdSolver.run();
//convert ssvdSolver.getUPath() to a csv file

I am not sure what you mean:
"Did you account for the fact that your matrix is small enough that it
probably wasn't divided correctly?"

--ahmed

On Sat, Sep 1, 2012 at 10:52 AM, Dmitriy Lyubimov <dlieu.7@gmail.com> wrote:

> No its zero mean uniform of course. A murmur scaled to -1...1 range.
>
> I used to use normal too but you advised there were not much difference and
> i actually did not see much either.
>
> I also think that in this case me moving the input to R via decimals
> actually created precision errors too. I will double check. And my
> synthetic test input has a flat tale in the lower singular numbers which of
> course messes up some singular vectors in the tale but doesnt affect
> singular values. I will check for these things and look again. But i dont
> see a fundamental problems with the resuls i see, they are the same down to
> eighth digit after the dot, so there is no fundamental problem here.
>  On Sep 1, 2012 1:03 AM, "Ted Dunning" <ted.dunning@gmail.com> wrote:
>
> > Oho...
> >
> > If the uniform randoms have non-zero means, then this could be a
> > significant effect that leads to some loss of significance in the
> results.
> >  For small matrices the resulting difference shouldn't be huge but it
> might
> > well be observable.
> >
> > On Sat, Sep 1, 2012 at 3:45 AM, Dmitriy Lyubimov <dlieu.7@gmail.com>
> > wrote:
> >
> > > sorry, i meant  "random trinary"
> > >
> > > On Sat, Sep 1, 2012 at 12:39 AM, Dmitriy Lyubimov <dlieu.7@gmail.com>
> > > wrote:
> > > > Hm. there is slight error between R full rank SVD and Mahout MR SSVD
> > > > for my unit test modified for 100x100 k= 3 p=10.
> > > >
> > > > First left vector (R/SSVD) :
> > > >> s$u[,1]
> > > >   [1] -0.050741660 -0.083985411  0.078767108 -0.044487425
> -0.010380367
> > > >   [6]  0.069635451  0.158337400  0.029102044 -0.168156173
> -0.127921554
> > > >  [11]  0.012698809 -0.027140724  0.069357925 -0.015605283
>  0.076614201
> > > >  [16] -0.158582188  0.143656275  0.033886221 -0.055111330
> -0.029299261
> > > >  [21]  0.059667350  0.039205405  0.042027376  0.048541162
>  0.158267382
> > > >  [26] -0.045441433  0.044529295 -0.038681358 -0.024035611
> -0.054543123
> > > >  [31]  0.027365365 -0.054029635 -0.021845631  0.053124795
>  0.050475680
> > > >  [36] -0.093776477  0.094699229 -0.030911885 -0.169810667
>  0.149075410
> > > >  [41]  0.102150407  0.165651229  0.175798233 -0.048390507
>  0.175243690
> > > >  [46] -0.170793896  0.059918820 -0.132466003 -0.131783388
> -0.178422266
> > > >  [51]  0.079304233 -0.054428953  0.057820900  0.120791505
>  0.095287617
> > > >  [56]  0.036671894 -0.081203386  0.153768112  0.014849405
>  0.027470798
> > > >  [61] -0.064944829 -0.007538214  0.069034637 -0.133978151
> -0.022290433
> > > >  [66] -0.038094067  0.168947231 -0.100797474 -0.054253041
> -0.040255069
> > > >  [71]  0.124817481 -0.059689202  0.018821181 -0.131237426
> -0.141223359
> > > >  [76]  0.128026731 -0.170388319  0.080445852  0.071966615
> -0.029745918
> > > >  [81]  0.049479520 -0.121362268 -0.077338205 -0.061950828
> -0.168851635
> > > >  [86] -0.073192796  0.087453086 -0.085166577  0.160026655
> -0.060816556
> > > >  [91]  0.015420973  0.117780809  0.083415819 -0.160806975
>  0.171932591
> > > >  [96]  0.170064367  0.001479280 -0.161878123  0.129685305
> -0.104231610
> > > >> U[,1]
> > > >            1            2            3            4            5
> > >    6
> > > >  0.050741634  0.083985464 -0.078767344  0.044487660  0.010380470
> > > -0.069635561
> > > >            7            8            9           10           11
> > >   12
> > > > -0.158337117 -0.029102012  0.168156073  0.127921760 -0.012698756
> > >  0.027140487
> > > >           13           14           15           16           17
> > >   18
> > > > -0.069358074  0.015605295 -0.076614050  0.158582091 -0.143656127
> > > -0.033886485
> > > >           19           20           21           22           23
> > >   24
> > > >  0.055111560  0.029299084 -0.059667201 -0.039205182 -0.042027356
> > > -0.048541087
> > > >           25           26           27           28           29
> > >   30
> > > > -0.158267335  0.045441521 -0.044529241  0.038681577  0.024035604
> > >  0.054543106
> > > >           31           32           33           34           35
> > >   36
> > > > -0.027365256  0.054029674  0.021845620 -0.053124833 -0.050475677
> > >  0.093776656
> > > >           37           38           39           40           41
> > >   42
> > > > -0.094699463  0.030911730  0.169810791 -0.149075076 -0.102150266
> > > -0.165651017
> > > >           43           44           45           46           47
> > >   48
> > > > -0.175798375  0.048390265 -0.175243708  0.170793758 -0.059918703
> > >  0.132465938
> > > >           49           50           51           52           53
> > >   54
> > > >  0.131783579  0.178422152 -0.079304282  0.054428751 -0.057820999
> > > -0.120791565
> > > >           55           56           57           58           59
> > >   60
> > > > -0.095287586 -0.036671995  0.081203324 -0.153767938 -0.014849361
> > > -0.027471027
> > > >           61           62           63           64           65
> > >   66
> > > >  0.064944979  0.007538413 -0.069034788  0.133978044  0.022290513
> > >  0.038094051
> > > >           67           68           69           70           71
> > >   72
> > > > -0.168947352  0.100797649  0.054253165  0.040255237 -0.124817480
> > >  0.059689502
> > > >           73           74           75           76           77
> > >   78
> > > > -0.018821295  0.131237429  0.141223597 -0.128027116  0.170388135
> > > -0.080445760
> > > >           79           80           81           82           83
> > >   84
> > > > -0.071966482  0.029745819 -0.049479559  0.121362303  0.077338278
> > >  0.061950724
> > > >           85           86           87           88           89
> > >   90
> > > >  0.168851648  0.073193002 -0.087453189  0.085166809 -0.160026464
> > >  0.060816590
> > > >           91           92           93           94           95
> > >   96
> > > > -0.015421147 -0.117780975 -0.083415727  0.160806958 -0.171932343
> > > -0.170064514
> > > >           97           98           99          100
> > > > -0.001479434  0.161878089 -0.129685379  0.104231530
> > > >
> > > > Same thing for the right singular vectors. The only thing is that
> they
> > > > seem to change the sign between R and Mahout's version but otherwise
> > > > they fit more or less exactly.
> > > >
> > > > So yeah i am seeing some stochastic effects in these for k and p
> being
> > > > so low -- so are you saying your errors are greater than those? I did
> > > > not test sequential version with similar parameters.
> > > >
> > > > One significant difference between MR and sequential version is that
> > > > sequential version is using ternary random matrix (instead of uniform
> > > > one), perhaps that may affect accuracy a little bit.
> > > >
> > > > On Fri, Aug 31, 2012 at 10:55 PM, Ted Dunning <ted.dunning@gmail.com
> >
> > > wrote:
> > > >> Can you provide your test code?
> > > >>
> > > >> What difference did you observe?
> > > >>
> > > >> Did you account for the fact that your matrix is small enough that
> it
> > > >> probably wasn't divided correctly?
> > > >>
> > > >> On Sat, Sep 1, 2012 at 1:27 AM, Ahmed Elgohary <aagohary@gmail.com>
> > > wrote:
> > > >>
> > > >>> Hi,
> > > >>>
> > > >>> I used mahout's stochastic svd implementation to find the singular
> > > vectors
> > > >>> and the singular vectors of a small matrix 99x100. Then, I compared
> > the
> > > >>> results to the singular values and the singular vectors obtained
> > using
> > > the
> > > >>> svd function in matlab and the single threaded version of the
> ssvd. I
> > > got
> > > >>> pretty much the same singular values using the 3 implementations.
> > > however,
> > > >>> the singular vectors of mahout's ssvd were significantly
> different. I
> > > tried
> > > >>> multiple values for the parameters P and Q but, that does not
seem
> to
> > > solve
> > > >>> the problem. Does MR implementation of the SSVD do extra
> > approximations
> > > >>> over the single threaded ssvd so their results might not be the
> same?
> > > Any
> > > >>> advice how I can tune mahout's ssvd to get the same singular
> vectors
> > > of the
> > > >>> single threaded ssvd?
> > > >>>
> > > >>> thanks,
> > > >>>
> > > >>> --ahmed
> > > >>>
> > >
> >
>

Mime
  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message