mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Lyubimov <dlie...@gmail.com>
Subject Re: Singular Value Decomposition does not return correct eigenvalues and -vectors
Date Sat, 24 Sep 2011 03:46:20 GMT
I already fixed full rank (p =0) on the trunk. It was just an invalid
assertion, the algorithm isn't limiting that. So k=3 p=0 should be ok now in
the trunk.
On Sep 23, 2011 8:34 PM, "Ted Dunning" <ted.dunning@gmail.com> wrote:
> Markus,
>
> Try testing on a 20x20 matrix if you want to use p>0. The issue is that
> this is an approximation algorithm that works for reasonably high
dimension.
> 3 is not reasonably high. 20 is probably marginal.
>
> On Fri, Sep 23, 2011 at 4:42 PM, Dmitriy Lyubimov <dlyubimov@apache.org
>wrote:
>
>> oh, ok, apparently you need to use p>0.
>>
>> but then there's a problem that ther's k+p >=m (input height)
>> requirement so I guess this is a corner case i did not account for.
>>
>> you can use k=2 and p=1 and caveat is that even though 3 singular
>> values will be computed, only 2 of them will be saved. this solver
>> always assumes "thin" decomposition requirement\s, although
>> distinction is purely technical, it is only a matter a patch to enable
>> p=0.
>>
>> It is only a case because your input so small. In practice, input is
>> much "longer" than k+p rows so it hasn't come up as an issue. Point
>> is, it will not do full rank decomposition with small matrices; but
>> then, you don't want to use it with small matrices :)
>>
>> alhough i can engineer a patch to allow p=0 and full rank
>> decompositions for short wide matrices if it is that important.
>>
>> -dmitriy
>>
>> On Fri, Sep 23, 2011 at 3:42 PM, Markus Holtermann
>> <info@markusholtermann.eu> wrote:
>> > Thank you for all your responses.
>> >
>> > ref. Dan Brickley:
>> > ------------------
>> > hopefully you did dream ;-)
>> >
>> > ref. Dmitriy Lyubimov:
>> > ----------------------
>> > When I run `mahout ssvd -i A.seq -o A-ssvd/ -k 3 -p 0` I get an
>> > IllegalArgumentException. You can find the traceback at
>> > http://paste.pocoo.org/show/481168/ .
>> >
>> > ref. Ted Dunning:
>> > -----------------
>> > I am running the M/R version of SVD in local mode. I didn't install
>> > Hadoop except what is coming via `mvn install`.
>> > If I understand the code correctly, the `--inMemory` argument is only
>> > relevant for the "EigenVerificationJob" -- I didn't run that.
>> >
>> > Here are the latest results for the calculations as described in my
>> > previous mail:
>> >
>> > For 1:
>> > Key class: class org.apache.hadoop.io.IntWritable
>> > Value Class: class org.apache.mahout.math.VectorWritable
>> > Key: 0: Value: eigenVector0, eigenvalue = 11.344411508600611:
>> > {0:0.8940505788976013,1:0.05761556873901637,2:-0.44424543735613486}
>> > Key: 1: Value: eigenVector1, eigenvalue = 0.0:
>> > {0:-0.3030457633656634,1:0.8081220356417685,2:-0.5050762722761053}
>> > Key: 2: Value: eigenVector2, eigenvalue = -0.4362482432944815:
>> > {0:0.3299042704770375,1:0.5861904313011974,2:0.7399621277956934}
>> > Count: 3
>> >
>> > For 2:
>> > Key class: class org.apache.hadoop.io.IntWritable
>> > Value Class: class org.apache.mahout.math.VectorWritable
>> > Key: 0: Value: eigenVector0, eigenvalue = 11.344814282762082:
>> > {0:0.7369762290995766,1:0.3279852776056837,2:-0.5910090485061045}
>> > Key: 1: Value: eigenVector1, eigenvalue = 0.17091518882717976:
>> > {0:0.9225878132457447,1:0.3812202473600341,2:0.05918487858557608}
>> > Key: 2: Value: eigenVector2, eigenvalue = 0.0:
>> > {0:-0.5910090485061055,1:0.7369762290995774,2:-0.3279852776056802}
>> > Key: 3: Value: eigenVector3, eigenvalue =
>> >
>>
-0.5157294715892533:{0:-0.32798527760568197,1:-0.5910090485061036,2:-0.7369762290995783}
>> > Count: 4
>> >
>> > For 3:
>> > Key class: class org.apache.hadoop.io.IntWritable
>> > Value Class: class org.apache.mahout.math.VectorWritable
>> > Key: 0: Value: eigenVector0, eigenvalue = 11.344814080004587:
>> > {0:0.2870124314018251,1:-0.8054865010309287,2:0.5184740696291035}
>> > Key: 1: Value: eigenVector1, eigenvalue = 0.4852290375835231:
>> > {0:0.9000472484774761,1:0.041469409433508436,2:-0.4338147514658307}
>> > Key: 2: Value: eigenVector2, eigenvalue = 0.0:
>> > {0:0.3279311127797073,1:0.5911613863727806,2:0.7368781449689461}
>> > Count: 3
>> >
>> > For 4:
>> > Key class: class org.apache.hadoop.io.IntWritable
>> > Value Class: class org.apache.mahout.math.VectorWritable
>> > Key: 0: Value: eigenVector0, eigenvalue = 11.34481428276208:
>> > {0:0.788451139115581,1:0.5058848349238699,2:0.3498933194866569}
>> > Key: 1: Value: eigenVector1, eigenvalue = 0.5157294715892401:
>> > {0:-0.5910090485061453,1:0.7369762290995597,2:-0.32798527760564816}
>> > Key: 2: Value: eigenVector2, eigenvalue = 0.1709151888272022:
>> > {0:-0.7369762290995447,1:-0.3279852776057236,2:0.5910090485061223}
>> > Key: 3: Value: eigenVector3, eigenvalue = 0.0:
>> > {0:-0.3279852776056819,1:-0.5910090485061036,2:-0.7369762290995783}
>> > Count: 4
>> >
>> > For 5:
>> > Key class: class org.apache.hadoop.io.IntWritable
>> > Value Class: class org.apache.mahout.math.VectorWritable
>> > Key: 0: Value: eigenVector0, eigenvalue = 7.7949818262315:
>> > {0:-0.3998289016610171,1:0.3486764982772797,2:0.8476800982361441}
>> > Key: 1: Value: eigenVector1, eigenvalue = 0.0:
>> > {0:0.3244428422615253,1:-0.8111071056538125,2:0.4866642633922878}
>> > Key: 2: Value: eigenVector2, eigenvalue = -2.2686660367578133:
>> > {0:0.8572477421969729,1:0.4696061783100697,2:0.21117846905213422}
>> > Count: 3
>> >
>> > For 6:
>> > Key class: class org.apache.hadoop.io.IntWritable
>> > Value Class: class org.apache.mahout.math.VectorWritable
>> > Key: 0: Value: eigenVector0, eigenvalue = 9.903422603237882:
>> > {0:-0.305869782876591,1:-0.012493432384138303,2:0.9519913813004245}
>> > Key: 1: Value: eigenVector1, eigenvalue = 6.002722238353203:
>> > {0:-0.7781330995244824,1:0.06366543541563939,2:0.624864458709054}
>> > Key: 2: Value: eigenVector2, eigenvalue = 0.0:
>> > {0:0.2988138112963618,1:0.9481291552697455,2:0.10845003967736172}
>> > Key: 3: Value: eigenVector3, eigenvalue = -3.906144841591079:
>> > {0:0.9039656974142156,1:-0.3176397630567398,2:0.2862708487144453}
>> > Count: 4
>> >
>> > For 7:
>> > Key class: class org.apache.hadoop.io.IntWritable
>> > Value Class: class org.apache.mahout.math.VectorWritable
>> > Key: 0: Value: eigenVector0, eigenvalue = 7.04924152040162:
>> > {0:-0.4082482904638631,1:0.8164965809277261,2:-0.4082482904638631}
>> > Key: 1: Value: eigenVector1, eigenvalue = 3.782617346103868:
>> > {0:0.7808892910047764,1:0.08072916428282848,2:-0.6194309624391194}
>> > Key: 2: Value: eigenVector2, eigenvalue = 0.0:
>> > {0:0.47280571964327067,1:0.5716783495703939,2:0.6705509794975171}
>> > Count: 3
>> >
>> > For 8:
>> > Key class: class org.apache.hadoop.io.IntWritable
>> > Value Class: class org.apache.mahout.math.VectorWritable
>> > Key: 0: Value: eigenVector0, eigenvalue = 7.964450219004663:
>> > {0:NaN,1:NaN,2:NaN}
>> > Key: 1: Value: eigenVector1, eigenvalue = 7.000000000000002:
>> > {0:NaN,1:NaN,2:NaN}
>> > Key: 2: Value: eigenVector2, eigenvalue = 0.753347668076679:
>> > {0:NaN,1:NaN,2:NaN}
>> > Key: 3: Value: eigenVector3, eigenvalue = 0.0:
>> > {0:NaN,1:NaN,2:NaN}
>> > Count: 4
>> >
>> >
>> > ref. Danny Bickson:
>> > -------------------
>> > Thanks for your confirmation on how to use the rank.
>> > Regarding the scale factor and orthogonalization: Yes, I take it into
>> > account. I'm running SVD from trunk without any changes. And even after
>> > commenting out those parts of the code, the results are still wrong in
>> > the cases 1, 2, 3, 7 and 8
>> >
>> > Thank you for your help.
>> >
>> > Markus
>> >
>> >
>> >> On 22 Sep 2011, at 18:37, Markus Holtermann
>> >> <info@markusholtermann.eu> wrote:
>> >>
>> >>> Hello there,
>> >>>
>> >>> I'm trying to run Mahout's Singular Value Decomposition but
>> >>> realized, that the resulting eigenvalues are wrong in most cases.
>> >>> So I took two small 3x3 matrices and calculated their eigenvalues
>> >>> and eigenvectors by hand and compared the results to Mahout.
>> >>>
>> >>> Only in one of eight cases the results for Mahout and my pen &
>> >>> paper matched.
>> >>>
>> >>> Lets take A = {{1,2,3},{2,4,5},{3,5,6}} and B =
>> >>> {{5,2,4},{-3,6,2},{3,-3,1}}
>> >>>
>> >>> As you can see, A is symmetric, B is not.
>> >>>
>> >>> I ran `mahout svd --output out/ --numRows 3 --numCols 3` eight
>> >>> times with different arguments:
>> >>>
>> >>> 1) --input A --rank 3 --symmetric true result is wrong 2)
>> >>> --input A --rank 4 --symmetric true result is wrong 3) --input
>> >>> A --rank 3 --symmetric false result is wrong 4) --input A --rank
>> >>> 4 --symmetric false result is CORRECT
>> >>>
>> >>> 5) --input B --rank 3 --symmetric true result is wrong 6)
>> >>> --input B --rank 4 --symmetric true result is wrong 7) --input
>> >>> B --rank 3 --symmetric false result is wrong 8) --input B --rank
>> >>> 4 --symmetric false result is wrong
>> >>>
>> >>> To verify that my input data is correct, this is the result of
>> >>> `mahout seqdumper`
>> >>>
>> >>> For A: Key class: class org.apache.hadoop.io.IntWritable Value
>> >>> Class: class org.apache.mahout.math.VectorWritable Key: 0: Value:
>> >>> {0:1.0,1:2.0,2:3.0} Key: 1: Value: {0:2.0,1:4.0,2:5.0} Key: 2:
>> >>> Value: {0:3.0,1:5.0,2:6.0} Count: 3
>> >>>
>> >>>
>> >>> For B: Key class: class org.apache.hadoop.io.IntWritable Value
>> >>> Class: class org.apache.mahout.math.VectorWritable Key: 0: Value:
>> >>> {0:5.0,1:2.0,2:4.0} Key: 1: Value: {0:-3.0,1:6.0,2:2.0} Key: 2:
>> >>> Value: {0:3.0,1:-3.0,2:1.0} Count: 3
>> >>>
>> >>>
>> >>> And finally, the correct eigenvalues should be: For A: λ1 = 11.3448
>> >>> λ2 = -0.515729 λ3 = 0.170915
>> >>>
>> >>> For B: λ1 = 7 λ2 = 3 λ3 = 2
>> >>>
>> >>> So, are there any known bugs in Mahout's SVD implementation? Am I
>> >>> doing something wrong? Is this algorithm known to produce wrong
>> >>> results?
>> >>>
>> >>> Thanks in advance.
>> >>>
>> >>> Markus
>> >
>> >
>>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message