mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Lyubimov <dlie...@gmail.com>
Subject Re: SSVD Wrong Singular Vectors
Date Sun, 02 Sep 2012 05:35:35 GMT
I'll take a look although it may take me a while to find time.

I have SSVD flow with power iterations in R and i collate results from
that and java version. I don't immediately have a code to convert form
csv/R to Mahout (only in reverse direction) which is a shame, i
should've made more progress on R and Mahout integration. Which is why
it will take me time.

Indeed, SSVD is for large matrices and we did accuracy comparisons on
large data sets (e.g. wikipedia). LLNL doesn't work as gppd on small
matrices.

However if i find that uniform 0-mean distribution is problematic for
small problems, i may want to do a patch to address that.

Also the fact that your data in the tale has a flat spectrum adds to
error a lot. I.e. what you have is one principal direction and a lot
of random noise around it, and like i said before, random data is not
going to produce good results beyond that one principal direction.
Detecting a trend in what is mostly noise is not working well with
this method esp. in conjunction with so few samples.

but there is still a concern in a sense that power iterations
should've helped more than they did. I'll take a closer look but it
will take me a while to figure if there's something we can improve
here.

One thing is the way to read U matrix: strictly speaking, rows of U
matrix are not necessarily coming in the same order as rows of A in
the final output. But they are keyed by the same keys (so it is
possible that what you thing is U[1,] is actually something else). But
it will take me some time to verify that.


On Sat, Sep 1, 2012 at 9:26 PM, Ahmed Elgohary <aagohary@gmail.com> wrote:
> - I am using k = 30 and p = 2 so (k+p)<99 (Rank(A))
> - I am attaching the csv file of the matrix A
> - yes, the difference is significant. Here is the output of the sequential
> SSVD:
> u(1,1:10):
> -0.0987    0.1334    0.1676   -0.0251   -0.2201   -0.0629   -0.0601
> -0.0575    0.0079    0.0519
>
> and the output of matlab's svd:
> u(1,1:10):
>   -0.0987   -0.1320    0.1662    0.0492   -0.1828    0.1156   -0.0678
> 0.0504   -0.0160    0.0350
>
> and the output of mahout's SSVD:
> u(1,1:10):
>   0.0962    0.1924   -0.2125    0.1668   -0.0188    0.0867   -0.0908
> 0.0264   -0.0443    0.0207
>
>
> - The code I am using is like this:
> //convert A.csv to a sequencefile /data/A
> SSVDSolver ssvdSolver = new SSVDSolver(new Configuration(), new Path[] { new
> Path("/data/A") }, new Path("/ssvd/output"), 1000, 30, 2, 1);
> ssvdSolver.setQ(1);
> ssvdSolver.run();
> //convert ssvdSolver.getUPath() to a csv file
>
> I am not sure what you mean:
>
> "Did you account for the fact that your matrix is small enough that it
> probably wasn't divided correctly?"
>
> --ahmed
>
>
> On Sat, Sep 1, 2012 at 10:52 AM, Dmitriy Lyubimov <dlieu.7@gmail.com> wrote:
>>
>> No its zero mean uniform of course. A murmur scaled to -1...1 range.
>>
>> I used to use normal too but you advised there were not much difference
>> and
>> i actually did not see much either.
>>
>> I also think that in this case me moving the input to R via decimals
>> actually created precision errors too. I will double check. And my
>> synthetic test input has a flat tale in the lower singular numbers which
>> of
>> course messes up some singular vectors in the tale but doesnt affect
>> singular values. I will check for these things and look again. But i dont
>> see a fundamental problems with the resuls i see, they are the same down
>> to
>> eighth digit after the dot, so there is no fundamental problem here.
>>  On Sep 1, 2012 1:03 AM, "Ted Dunning" <ted.dunning@gmail.com> wrote:
>>
>> > Oho...
>> >
>> > If the uniform randoms have non-zero means, then this could be a
>> > significant effect that leads to some loss of significance in the
>> > results.
>> >  For small matrices the resulting difference shouldn't be huge but it
>> > might
>> > well be observable.
>> >
>> > On Sat, Sep 1, 2012 at 3:45 AM, Dmitriy Lyubimov <dlieu.7@gmail.com>
>> > wrote:
>> >
>> > > sorry, i meant  "random trinary"
>> > >
>> > > On Sat, Sep 1, 2012 at 12:39 AM, Dmitriy Lyubimov <dlieu.7@gmail.com>
>> > > wrote:
>> > > > Hm. there is slight error between R full rank SVD and Mahout MR SSVD
>> > > > for my unit test modified for 100x100 k= 3 p=10.
>> > > >
>> > > > First left vector (R/SSVD) :
>> > > >> s$u[,1]
>> > > >   [1] -0.050741660 -0.083985411  0.078767108 -0.044487425
>> > > > -0.010380367
>> > > >   [6]  0.069635451  0.158337400  0.029102044 -0.168156173
>> > > > -0.127921554
>> > > >  [11]  0.012698809 -0.027140724  0.069357925 -0.015605283
>> > > > 0.076614201
>> > > >  [16] -0.158582188  0.143656275  0.033886221 -0.055111330
>> > > > -0.029299261
>> > > >  [21]  0.059667350  0.039205405  0.042027376  0.048541162
>> > > > 0.158267382
>> > > >  [26] -0.045441433  0.044529295 -0.038681358 -0.024035611
>> > > > -0.054543123
>> > > >  [31]  0.027365365 -0.054029635 -0.021845631  0.053124795
>> > > > 0.050475680
>> > > >  [36] -0.093776477  0.094699229 -0.030911885 -0.169810667
>> > > > 0.149075410
>> > > >  [41]  0.102150407  0.165651229  0.175798233 -0.048390507
>> > > > 0.175243690
>> > > >  [46] -0.170793896  0.059918820 -0.132466003 -0.131783388
>> > > > -0.178422266
>> > > >  [51]  0.079304233 -0.054428953  0.057820900  0.120791505
>> > > > 0.095287617
>> > > >  [56]  0.036671894 -0.081203386  0.153768112  0.014849405
>> > > > 0.027470798
>> > > >  [61] -0.064944829 -0.007538214  0.069034637 -0.133978151
>> > > > -0.022290433
>> > > >  [66] -0.038094067  0.168947231 -0.100797474 -0.054253041
>> > > > -0.040255069
>> > > >  [71]  0.124817481 -0.059689202  0.018821181 -0.131237426
>> > > > -0.141223359
>> > > >  [76]  0.128026731 -0.170388319  0.080445852  0.071966615
>> > > > -0.029745918
>> > > >  [81]  0.049479520 -0.121362268 -0.077338205 -0.061950828
>> > > > -0.168851635
>> > > >  [86] -0.073192796  0.087453086 -0.085166577  0.160026655
>> > > > -0.060816556
>> > > >  [91]  0.015420973  0.117780809  0.083415819 -0.160806975
>> > > > 0.171932591
>> > > >  [96]  0.170064367  0.001479280 -0.161878123  0.129685305
>> > > > -0.104231610
>> > > >> U[,1]
>> > > >            1            2            3            4            5
>> > >    6
>> > > >  0.050741634  0.083985464 -0.078767344  0.044487660  0.010380470
>> > > -0.069635561
>> > > >            7            8            9           10           11
>> > >   12
>> > > > -0.158337117 -0.029102012  0.168156073  0.127921760 -0.012698756
>> > >  0.027140487
>> > > >           13           14           15           16           17
>> > >   18
>> > > > -0.069358074  0.015605295 -0.076614050  0.158582091 -0.143656127
>> > > -0.033886485
>> > > >           19           20           21           22           23
>> > >   24
>> > > >  0.055111560  0.029299084 -0.059667201 -0.039205182 -0.042027356
>> > > -0.048541087
>> > > >           25           26           27           28           29
>> > >   30
>> > > > -0.158267335  0.045441521 -0.044529241  0.038681577  0.024035604
>> > >  0.054543106
>> > > >           31           32           33           34           35
>> > >   36
>> > > > -0.027365256  0.054029674  0.021845620 -0.053124833 -0.050475677
>> > >  0.093776656
>> > > >           37           38           39           40           41
>> > >   42
>> > > > -0.094699463  0.030911730  0.169810791 -0.149075076 -0.102150266
>> > > -0.165651017
>> > > >           43           44           45           46           47
>> > >   48
>> > > > -0.175798375  0.048390265 -0.175243708  0.170793758 -0.059918703
>> > >  0.132465938
>> > > >           49           50           51           52           53
>> > >   54
>> > > >  0.131783579  0.178422152 -0.079304282  0.054428751 -0.057820999
>> > > -0.120791565
>> > > >           55           56           57           58           59
>> > >   60
>> > > > -0.095287586 -0.036671995  0.081203324 -0.153767938 -0.014849361
>> > > -0.027471027
>> > > >           61           62           63           64           65
>> > >   66
>> > > >  0.064944979  0.007538413 -0.069034788  0.133978044  0.022290513
>> > >  0.038094051
>> > > >           67           68           69           70           71
>> > >   72
>> > > > -0.168947352  0.100797649  0.054253165  0.040255237 -0.124817480
>> > >  0.059689502
>> > > >           73           74           75           76           77
>> > >   78
>> > > > -0.018821295  0.131237429  0.141223597 -0.128027116  0.170388135
>> > > -0.080445760
>> > > >           79           80           81           82           83
>> > >   84
>> > > > -0.071966482  0.029745819 -0.049479559  0.121362303  0.077338278
>> > >  0.061950724
>> > > >           85           86           87           88           89
>> > >   90
>> > > >  0.168851648  0.073193002 -0.087453189  0.085166809 -0.160026464
>> > >  0.060816590
>> > > >           91           92           93           94           95
>> > >   96
>> > > > -0.015421147 -0.117780975 -0.083415727  0.160806958 -0.171932343
>> > > -0.170064514
>> > > >           97           98           99          100
>> > > > -0.001479434  0.161878089 -0.129685379  0.104231530
>> > > >
>> > > > Same thing for the right singular vectors. The only thing is that
>> > > > they
>> > > > seem to change the sign between R and Mahout's version but otherwise
>> > > > they fit more or less exactly.
>> > > >
>> > > > So yeah i am seeing some stochastic effects in these for k and p
>> > > > being
>> > > > so low -- so are you saying your errors are greater than those? I
>> > > > did
>> > > > not test sequential version with similar parameters.
>> > > >
>> > > > One significant difference between MR and sequential version is that
>> > > > sequential version is using ternary random matrix (instead of
>> > > > uniform
>> > > > one), perhaps that may affect accuracy a little bit.
>> > > >
>> > > > On Fri, Aug 31, 2012 at 10:55 PM, Ted Dunning
>> > > > <ted.dunning@gmail.com>
>> > > wrote:
>> > > >> Can you provide your test code?
>> > > >>
>> > > >> What difference did you observe?
>> > > >>
>> > > >> Did you account for the fact that your matrix is small enough
that
>> > > >> it
>> > > >> probably wasn't divided correctly?
>> > > >>
>> > > >> On Sat, Sep 1, 2012 at 1:27 AM, Ahmed Elgohary <aagohary@gmail.com>
>> > > wrote:
>> > > >>
>> > > >>> Hi,
>> > > >>>
>> > > >>> I used mahout's stochastic svd implementation to find the
singular
>> > > vectors
>> > > >>> and the singular vectors of a small matrix 99x100. Then, I
>> > > >>> compared
>> > the
>> > > >>> results to the singular values and the singular vectors obtained
>> > using
>> > > the
>> > > >>> svd function in matlab and the single threaded version of
the
>> > > >>> ssvd. I
>> > > got
>> > > >>> pretty much the same singular values using the 3 implementations.
>> > > however,
>> > > >>> the singular vectors of mahout's ssvd were significantly
>> > > >>> different. I
>> > > tried
>> > > >>> multiple values for the parameters P and Q but, that does
not seem
>> > > >>> to
>> > > solve
>> > > >>> the problem. Does MR implementation of the SSVD do extra
>> > approximations
>> > > >>> over the single threaded ssvd so their results might not be
the
>> > > >>> same?
>> > > Any
>> > > >>> advice how I can tune mahout's ssvd to get the same singular
>> > > >>> vectors
>> > > of the
>> > > >>> single threaded ssvd?
>> > > >>>
>> > > >>> thanks,
>> > > >>>
>> > > >>> --ahmed
>> > > >>>
>> > >
>> >
>
>

Mime
View raw message