mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Lyubimov <dlyubi...@apache.org>
Subject Re: Singular Value Decomposition does not return correct eigenvalues and -vectors
Date Sat, 24 Sep 2011 00:29:00 GMT
Markus, ok, use of p=0 is enabled on the trunk, verified and committed.

On Fri, Sep 23, 2011 at 4:42 PM, Dmitriy Lyubimov <dlyubimov@apache.org> wrote:
> oh, ok, apparently you need to use p>0.
>
> but then there's a problem that ther's  k+p >=m (input height)
> requirement so I guess this is a corner case i did not account for.
>
> you can use k=2 and p=1 and caveat is that even though 3 singular
> values will be computed, only 2 of them will be saved. this solver
> always assumes "thin" decomposition requirement\s, although
> distinction is purely technical, it is only a matter a patch to enable
> p=0.
>
> It is only a case because your input so small. In practice, input is
> much "longer" than k+p rows so it hasn't come up as an issue. Point
> is, it will not do full rank decomposition with small matrices; but
> then, you don't want to use it with small matrices :)
>
> alhough i can engineer a patch to allow p=0 and full rank
> decompositions for short wide matrices if it is that important.
>
> -dmitriy
>
> On Fri, Sep 23, 2011 at 3:42 PM, Markus Holtermann
> <info@markusholtermann.eu> wrote:
>> Thank you for all your responses.
>>
>> ref. Dan Brickley:
>> ------------------
>> hopefully you did dream ;-)
>>
>> ref. Dmitriy Lyubimov:
>> ----------------------
>> When I run `mahout ssvd -i A.seq -o A-ssvd/ -k 3 -p 0` I get an
>> IllegalArgumentException. You can find the traceback at
>> http://paste.pocoo.org/show/481168/ .
>>
>> ref. Ted Dunning:
>> -----------------
>> I am running the M/R version of SVD in local mode. I didn't install
>> Hadoop except what is coming via `mvn install`.
>> If I understand the code correctly, the `--inMemory` argument is only
>> relevant for the "EigenVerificationJob" -- I didn't run that.
>>
>> Here are the latest results for the calculations as described in my
>> previous mail:
>>
>> For 1:
>> Key class: class org.apache.hadoop.io.IntWritable
>> Value Class: class org.apache.mahout.math.VectorWritable
>> Key: 0: Value: eigenVector0, eigenvalue = 11.344411508600611:
>> {0:0.8940505788976013,1:0.05761556873901637,2:-0.44424543735613486}
>> Key: 1: Value: eigenVector1, eigenvalue = 0.0:
>> {0:-0.3030457633656634,1:0.8081220356417685,2:-0.5050762722761053}
>> Key: 2: Value: eigenVector2, eigenvalue = -0.4362482432944815:
>> {0:0.3299042704770375,1:0.5861904313011974,2:0.7399621277956934}
>> Count: 3
>>
>> For 2:
>> Key class: class org.apache.hadoop.io.IntWritable
>> Value Class: class org.apache.mahout.math.VectorWritable
>> Key: 0: Value: eigenVector0, eigenvalue = 11.344814282762082:
>> {0:0.7369762290995766,1:0.3279852776056837,2:-0.5910090485061045}
>> Key: 1: Value: eigenVector1, eigenvalue = 0.17091518882717976:
>> {0:0.9225878132457447,1:0.3812202473600341,2:0.05918487858557608}
>> Key: 2: Value: eigenVector2, eigenvalue = 0.0:
>> {0:-0.5910090485061055,1:0.7369762290995774,2:-0.3279852776056802}
>> Key: 3: Value: eigenVector3, eigenvalue =
>> -0.5157294715892533:{0:-0.32798527760568197,1:-0.5910090485061036,2:-0.7369762290995783}
>> Count: 4
>>
>> For 3:
>> Key class: class org.apache.hadoop.io.IntWritable
>> Value Class: class org.apache.mahout.math.VectorWritable
>> Key: 0: Value: eigenVector0, eigenvalue = 11.344814080004587:
>> {0:0.2870124314018251,1:-0.8054865010309287,2:0.5184740696291035}
>> Key: 1: Value: eigenVector1, eigenvalue = 0.4852290375835231:
>> {0:0.9000472484774761,1:0.041469409433508436,2:-0.4338147514658307}
>> Key: 2: Value: eigenVector2, eigenvalue = 0.0:
>> {0:0.3279311127797073,1:0.5911613863727806,2:0.7368781449689461}
>> Count: 3
>>
>> For 4:
>> Key class: class org.apache.hadoop.io.IntWritable
>> Value Class: class org.apache.mahout.math.VectorWritable
>> Key: 0: Value: eigenVector0, eigenvalue = 11.34481428276208:
>> {0:0.788451139115581,1:0.5058848349238699,2:0.3498933194866569}
>> Key: 1: Value: eigenVector1, eigenvalue = 0.5157294715892401:
>> {0:-0.5910090485061453,1:0.7369762290995597,2:-0.32798527760564816}
>> Key: 2: Value: eigenVector2, eigenvalue = 0.1709151888272022:
>> {0:-0.7369762290995447,1:-0.3279852776057236,2:0.5910090485061223}
>> Key: 3: Value: eigenVector3, eigenvalue = 0.0:
>> {0:-0.3279852776056819,1:-0.5910090485061036,2:-0.7369762290995783}
>> Count: 4
>>
>> For 5:
>> Key class: class org.apache.hadoop.io.IntWritable
>> Value Class: class org.apache.mahout.math.VectorWritable
>> Key: 0: Value: eigenVector0, eigenvalue = 7.7949818262315:
>> {0:-0.3998289016610171,1:0.3486764982772797,2:0.8476800982361441}
>> Key: 1: Value: eigenVector1, eigenvalue = 0.0:
>> {0:0.3244428422615253,1:-0.8111071056538125,2:0.4866642633922878}
>> Key: 2: Value: eigenVector2, eigenvalue = -2.2686660367578133:
>> {0:0.8572477421969729,1:0.4696061783100697,2:0.21117846905213422}
>> Count: 3
>>
>> For 6:
>> Key class: class org.apache.hadoop.io.IntWritable
>> Value Class: class org.apache.mahout.math.VectorWritable
>> Key: 0: Value: eigenVector0, eigenvalue = 9.903422603237882:
>> {0:-0.305869782876591,1:-0.012493432384138303,2:0.9519913813004245}
>> Key: 1: Value: eigenVector1, eigenvalue = 6.002722238353203:
>> {0:-0.7781330995244824,1:0.06366543541563939,2:0.624864458709054}
>> Key: 2: Value: eigenVector2, eigenvalue = 0.0:
>> {0:0.2988138112963618,1:0.9481291552697455,2:0.10845003967736172}
>> Key: 3: Value: eigenVector3, eigenvalue = -3.906144841591079:
>> {0:0.9039656974142156,1:-0.3176397630567398,2:0.2862708487144453}
>> Count: 4
>>
>> For 7:
>> Key class: class org.apache.hadoop.io.IntWritable
>> Value Class: class org.apache.mahout.math.VectorWritable
>> Key: 0: Value: eigenVector0, eigenvalue = 7.04924152040162:
>> {0:-0.4082482904638631,1:0.8164965809277261,2:-0.4082482904638631}
>> Key: 1: Value: eigenVector1, eigenvalue = 3.782617346103868:
>> {0:0.7808892910047764,1:0.08072916428282848,2:-0.6194309624391194}
>> Key: 2: Value: eigenVector2, eigenvalue = 0.0:
>> {0:0.47280571964327067,1:0.5716783495703939,2:0.6705509794975171}
>> Count: 3
>>
>> For 8:
>> Key class: class org.apache.hadoop.io.IntWritable
>> Value Class: class org.apache.mahout.math.VectorWritable
>> Key: 0: Value: eigenVector0, eigenvalue = 7.964450219004663:
>> {0:NaN,1:NaN,2:NaN}
>> Key: 1: Value: eigenVector1, eigenvalue = 7.000000000000002:
>> {0:NaN,1:NaN,2:NaN}
>> Key: 2: Value: eigenVector2, eigenvalue = 0.753347668076679:
>> {0:NaN,1:NaN,2:NaN}
>> Key: 3: Value: eigenVector3, eigenvalue = 0.0:
>> {0:NaN,1:NaN,2:NaN}
>> Count: 4
>>
>>
>> ref. Danny Bickson:
>> -------------------
>> Thanks for your confirmation on how to use the rank.
>> Regarding the scale factor and orthogonalization: Yes, I take it into
>> account. I'm running SVD from trunk without any changes. And even after
>> commenting out those parts of the code, the results are still wrong in
>> the cases 1, 2, 3, 7 and 8
>>
>> Thank you for your help.
>>
>> Markus
>>
>>
>>> On 22 Sep 2011, at 18:37, Markus Holtermann
>>> <info@markusholtermann.eu> wrote:
>>>
>>>> Hello there,
>>>>
>>>> I'm trying to run Mahout's Singular Value Decomposition but
>>>> realized, that the resulting eigenvalues are wrong in most cases.
>>>> So I took two small 3x3 matrices and calculated their eigenvalues
>>>> and eigenvectors by hand and compared the results to Mahout.
>>>>
>>>> Only in one of eight cases the results for Mahout and my pen &
>>>> paper matched.
>>>>
>>>> Lets take A = {{1,2,3},{2,4,5},{3,5,6}} and B =
>>>> {{5,2,4},{-3,6,2},{3,-3,1}}
>>>>
>>>> As you can see, A is symmetric, B is not.
>>>>
>>>> I ran `mahout svd --output out/ --numRows 3 --numCols 3` eight
>>>> times with different arguments:
>>>>
>>>> 1) --input A --rank 3 --symmetric true    result is wrong 2)
>>>> --input A --rank 4 --symmetric true    result is wrong 3) --input
>>>> A --rank 3 --symmetric false   result is wrong 4) --input A --rank
>>>> 4 --symmetric false   result is CORRECT
>>>>
>>>> 5) --input B --rank 3 --symmetric true    result is wrong 6)
>>>> --input B --rank 4 --symmetric true    result is wrong 7) --input
>>>> B --rank 3 --symmetric false   result is wrong 8) --input B --rank
>>>> 4 --symmetric false   result is wrong
>>>>
>>>> To verify that my input data is correct, this is the result of
>>>> `mahout seqdumper`
>>>>
>>>> For A: Key class: class org.apache.hadoop.io.IntWritable Value
>>>> Class: class org.apache.mahout.math.VectorWritable Key: 0: Value:
>>>> {0:1.0,1:2.0,2:3.0} Key: 1: Value: {0:2.0,1:4.0,2:5.0} Key: 2:
>>>> Value: {0:3.0,1:5.0,2:6.0} Count: 3
>>>>
>>>>
>>>> For B: Key class: class org.apache.hadoop.io.IntWritable Value
>>>> Class: class org.apache.mahout.math.VectorWritable Key: 0: Value:
>>>> {0:5.0,1:2.0,2:4.0} Key: 1: Value: {0:-3.0,1:6.0,2:2.0} Key: 2:
>>>> Value: {0:3.0,1:-3.0,2:1.0} Count: 3
>>>>
>>>>
>>>> And finally, the correct eigenvalues should be: For A: λ1 = 11.3448
>>>> λ2 = -0.515729 λ3 = 0.170915
>>>>
>>>> For B: λ1 = 7 λ2 = 3 λ3 = 2
>>>>
>>>> So, are there any known bugs in Mahout's SVD implementation? Am I
>>>> doing something wrong? Is this algorithm known to produce wrong
>>>> results?
>>>>
>>>> Thanks in advance.
>>>>
>>>> Markus
>>
>>
>

Mime
View raw message