mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Lyubimov <dlie...@gmail.com>
Subject Re: Singular Value Decomposition does not return correct eigenvalues and -vectors
Date Thu, 29 Sep 2011 03:08:50 GMT
Also, like Ted said, using k= full rank is the same as running regular in
core method, both memory and could wise.

This method is for computing thin svd (k + p not to exceed perhaps 1000 for
practical purposes ) on otherwise really large inputs.

Please beware of misuse.

-Dmitriy
On Sep 28, 2011 1:32 PM, "Markus Holtermann" <info@markusholtermann.eu>
wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hey guys.
>
> Thank you for all the information about Singular Value Decomposition.
> The Lanczos algorithm seems to be a bad choice for small matrices. But
> the Stochastic SVD with k = full rank and p = 0 (thanks Dmitriy
> Lyubimov for implementing that) works fine.
>
> So far, Markus
>
> On 09/23/2011 08:46 PM, Dmitriy Lyubimov wrote:
>> I already fixed full rank (p =0) on the trunk. It was just an
>> invalid assertion, the algorithm isn't limiting that. So k=3 p=0
>> should be ok now in the trunk. On Sep 23, 2011 8:34 PM, "Ted
>> Dunning" <ted.dunning@gmail.com> wrote:
>>> Markus,
>>>
>>> Try testing on a 20x20 matrix if you want to use p>0. The issue
>>> is that this is an approximation algorithm that works for
>>> reasonably high
>> dimension.
>>> 3 is not reasonably high. 20 is probably marginal.
>>>
>>> On Fri, Sep 23, 2011 at 4:42 PM, Dmitriy Lyubimov
>>> <dlyubimov@apache.org wrote:
>>>
>>>> oh, ok, apparently you need to use p>0.
>>>>
>>>> but then there's a problem that ther's k+p >=m (input height)
>>>> requirement so I guess this is a corner case i did not account
>>>> for.
>>>>
>>>> you can use k=2 and p=1 and caveat is that even though 3
>>>> singular values will be computed, only 2 of them will be saved.
>>>> this solver always assumes "thin" decomposition requirement\s,
>>>> although distinction is purely technical, it is only a matter a
>>>> patch to enable p=0.
>>>>
>>>> It is only a case because your input so small. In practice,
>>>> input is much "longer" than k+p rows so it hasn't come up as an
>>>> issue. Point is, it will not do full rank decomposition with
>>>> small matrices; but then, you don't want to use it with small
>>>> matrices :)
>>>>
>>>> alhough i can engineer a patch to allow p=0 and full rank
>>>> decompositions for short wide matrices if it is that
>>>> important.
>>>>
>>>> -dmitriy
>>>>
>>>> On Fri, Sep 23, 2011 at 3:42 PM, Markus Holtermann
>>>> <info@markusholtermann.eu> wrote:
>>>>> Thank you for all your responses.
>>>>>
>>>>> ref. Dan Brickley: ------------------ hopefully you did dream
>>>>> ;-)
>>>>>
>>>>> ref. Dmitriy Lyubimov: ---------------------- When I run
>>>>> `mahout ssvd -i A.seq -o A-ssvd/ -k 3 -p 0` I get an
>>>>> IllegalArgumentException. You can find the traceback at
>>>>> http://paste.pocoo.org/show/481168/ .
>>>>>
>>>>> ref. Ted Dunning: ----------------- I am running the M/R
>>>>> version of SVD in local mode. I didn't install Hadoop except
>>>>> what is coming via `mvn install`. If I understand the code
>>>>> correctly, the `--inMemory` argument is only relevant for the
>>>>> "EigenVerificationJob" -- I didn't run that.
>>>>>
>>>>> Here are the latest results for the calculations as described
>>>>> in my previous mail:
>>>>>
>>>>> For 1: Key class: class org.apache.hadoop.io.IntWritable
>>>>> Value Class: class org.apache.mahout.math.VectorWritable Key:
>>>>> 0: Value: eigenVector0, eigenvalue = 11.344411508600611:
>>>>> {0:0.8940505788976013,1:0.05761556873901637,2:-0.44424543735613486}
>>>>>
>>>>>
> Key: 1: Value: eigenVector1, eigenvalue = 0.0:
>>>>> {0:-0.3030457633656634,1:0.8081220356417685,2:-0.5050762722761053}
>>>>>
>>>>>
> Key: 2: Value: eigenVector2, eigenvalue = -0.4362482432944815:
>>>>> {0:0.3299042704770375,1:0.5861904313011974,2:0.7399621277956934}
>>>>>
>>>>>
> Count: 3
>>>>>
>>>>> For 2: Key class: class org.apache.hadoop.io.IntWritable
>>>>> Value Class: class org.apache.mahout.math.VectorWritable Key:
>>>>> 0: Value: eigenVector0, eigenvalue = 11.344814282762082:
>>>>> {0:0.7369762290995766,1:0.3279852776056837,2:-0.5910090485061045}
>>>>>
>>>>>
> Key: 1: Value: eigenVector1, eigenvalue = 0.17091518882717976:
>>>>> {0:0.9225878132457447,1:0.3812202473600341,2:0.05918487858557608}
>>>>>
>>>>>
> Key: 2: Value: eigenVector2, eigenvalue = 0.0:
>>>>> {0:-0.5910090485061055,1:0.7369762290995774,2:-0.3279852776056802}
>>>>>
>>>>>
> Key: 3: Value: eigenVector3, eigenvalue =
>>>>>
>>>>
>>
-0.5157294715892533:{0:-0.32798527760568197,1:-0.5910090485061036,2:-0.7369762290995783}
>>>>>
>>
> Count: 4
>>>>>
>>>>> For 3: Key class: class org.apache.hadoop.io.IntWritable
>>>>> Value Class: class org.apache.mahout.math.VectorWritable Key:
>>>>> 0: Value: eigenVector0, eigenvalue = 11.344814080004587:
>>>>> {0:0.2870124314018251,1:-0.8054865010309287,2:0.5184740696291035}
>>>>>
>>>>>
> Key: 1: Value: eigenVector1, eigenvalue = 0.4852290375835231:
>>>>> {0:0.9000472484774761,1:0.041469409433508436,2:-0.4338147514658307}
>>>>>
>>>>>
> Key: 2: Value: eigenVector2, eigenvalue = 0.0:
>>>>> {0:0.3279311127797073,1:0.5911613863727806,2:0.7368781449689461}
>>>>>
>>>>>
> Count: 3
>>>>>
>>>>> For 4: Key class: class org.apache.hadoop.io.IntWritable
>>>>> Value Class: class org.apache.mahout.math.VectorWritable Key:
>>>>> 0: Value: eigenVector0, eigenvalue = 11.34481428276208:
>>>>> {0:0.788451139115581,1:0.5058848349238699,2:0.3498933194866569}
>>>>>
>>>>>
> Key: 1: Value: eigenVector1, eigenvalue = 0.5157294715892401:
>>>>> {0:-0.5910090485061453,1:0.7369762290995597,2:-0.32798527760564816}
>>>>>
>>>>>
> Key: 2: Value: eigenVector2, eigenvalue = 0.1709151888272022:
>>>>> {0:-0.7369762290995447,1:-0.3279852776057236,2:0.5910090485061223}
>>>>>
>>>>>
> Key: 3: Value: eigenVector3, eigenvalue = 0.0:
>>>>> {0:-0.3279852776056819,1:-0.5910090485061036,2:-0.7369762290995783}
>>>>>
>>>>>
> Count: 4
>>>>>
>>>>> For 5: Key class: class org.apache.hadoop.io.IntWritable
>>>>> Value Class: class org.apache.mahout.math.VectorWritable Key:
>>>>> 0: Value: eigenVector0, eigenvalue = 7.7949818262315:
>>>>> {0:-0.3998289016610171,1:0.3486764982772797,2:0.8476800982361441}
>>>>>
>>>>>
> Key: 1: Value: eigenVector1, eigenvalue = 0.0:
>>>>> {0:0.3244428422615253,1:-0.8111071056538125,2:0.4866642633922878}
>>>>>
>>>>>
> Key: 2: Value: eigenVector2, eigenvalue = -2.2686660367578133:
>>>>> {0:0.8572477421969729,1:0.4696061783100697,2:0.21117846905213422}
>>>>>
>>>>>
> Count: 3
>>>>>
>>>>> For 6: Key class: class org.apache.hadoop.io.IntWritable
>>>>> Value Class: class org.apache.mahout.math.VectorWritable Key:
>>>>> 0: Value: eigenVector0, eigenvalue = 9.903422603237882:
>>>>> {0:-0.305869782876591,1:-0.012493432384138303,2:0.9519913813004245}
>>>>>
>>>>>
> Key: 1: Value: eigenVector1, eigenvalue = 6.002722238353203:
>>>>> {0:-0.7781330995244824,1:0.06366543541563939,2:0.624864458709054}
>>>>>
>>>>>
> Key: 2: Value: eigenVector2, eigenvalue = 0.0:
>>>>> {0:0.2988138112963618,1:0.9481291552697455,2:0.10845003967736172}
>>>>>
>>>>>
> Key: 3: Value: eigenVector3, eigenvalue = -3.906144841591079:
>>>>> {0:0.9039656974142156,1:-0.3176397630567398,2:0.2862708487144453}
>>>>>
>>>>>
> Count: 4
>>>>>
>>>>> For 7: Key class: class org.apache.hadoop.io.IntWritable
>>>>> Value Class: class org.apache.mahout.math.VectorWritable Key:
>>>>> 0: Value: eigenVector0, eigenvalue = 7.04924152040162:
>>>>> {0:-0.4082482904638631,1:0.8164965809277261,2:-0.4082482904638631}
>>>>>
>>>>>
> Key: 1: Value: eigenVector1, eigenvalue = 3.782617346103868:
>>>>> {0:0.7808892910047764,1:0.08072916428282848,2:-0.6194309624391194}
>>>>>
>>>>>
> Key: 2: Value: eigenVector2, eigenvalue = 0.0:
>>>>> {0:0.47280571964327067,1:0.5716783495703939,2:0.6705509794975171}
>>>>>
>>>>>
> Count: 3
>>>>>
>>>>> For 8: Key class: class org.apache.hadoop.io.IntWritable
>>>>> Value Class: class org.apache.mahout.math.VectorWritable Key:
>>>>> 0: Value: eigenVector0, eigenvalue = 7.964450219004663:
>>>>> {0:NaN,1:NaN,2:NaN} Key: 1: Value: eigenVector1, eigenvalue =
>>>>> 7.000000000000002: {0:NaN,1:NaN,2:NaN} Key: 2: Value:
>>>>> eigenVector2, eigenvalue = 0.753347668076679:
>>>>> {0:NaN,1:NaN,2:NaN} Key: 3: Value: eigenVector3, eigenvalue =
>>>>> 0.0: {0:NaN,1:NaN,2:NaN} Count: 4
>>>>>
>>>>>
>>>>> ref. Danny Bickson: ------------------- Thanks for your
>>>>> confirmation on how to use the rank. Regarding the scale
>>>>> factor and orthogonalization: Yes, I take it into account.
>>>>> I'm running SVD from trunk without any changes. And even
>>>>> after commenting out those parts of the code, the results are
>>>>> still wrong in the cases 1, 2, 3, 7 and 8
>>>>>
>>>>> Thank you for your help.
>>>>>
>>>>> Markus
>>>>>
>>>>>
>>>>>> On 22 Sep 2011, at 18:37, Markus Holtermann
>>>>>> <info@markusholtermann.eu> wrote:
>>>>>>
>>>>>>> Hello there,
>>>>>>>
>>>>>>> I'm trying to run Mahout's Singular Value Decomposition
>>>>>>> but realized, that the resulting eigenvalues are wrong in
>>>>>>> most cases. So I took two small 3x3 matrices and
>>>>>>> calculated their eigenvalues and eigenvectors by hand and
>>>>>>> compared the results to Mahout.
>>>>>>>
>>>>>>> Only in one of eight cases the results for Mahout and my
>>>>>>> pen & paper matched.
>>>>>>>
>>>>>>> Lets take A = {{1,2,3},{2,4,5},{3,5,6}} and B =
>>>>>>> {{5,2,4},{-3,6,2},{3,-3,1}}
>>>>>>>
>>>>>>> As you can see, A is symmetric, B is not.
>>>>>>>
>>>>>>> I ran `mahout svd --output out/ --numRows 3 --numCols 3`
>>>>>>> eight times with different arguments:
>>>>>>>
>>>>>>> 1) --input A --rank 3 --symmetric true result is wrong
>>>>>>> 2) --input A --rank 4 --symmetric true result is wrong 3)
>>>>>>> --input A --rank 3 --symmetric false result is wrong 4)
>>>>>>> --input A --rank 4 --symmetric false result is CORRECT
>>>>>>>
>>>>>>> 5) --input B --rank 3 --symmetric true result is wrong
>>>>>>> 6) --input B --rank 4 --symmetric true result is wrong 7)
>>>>>>> --input B --rank 3 --symmetric false result is wrong 8)
>>>>>>> --input B --rank 4 --symmetric false result is wrong
>>>>>>>
>>>>>>> To verify that my input data is correct, this is the
>>>>>>> result of `mahout seqdumper`
>>>>>>>
>>>>>>> For A: Key class: class org.apache.hadoop.io.IntWritable
>>>>>>> Value Class: class org.apache.mahout.math.VectorWritable
>>>>>>> Key: 0: Value: {0:1.0,1:2.0,2:3.0} Key: 1: Value:
>>>>>>> {0:2.0,1:4.0,2:5.0} Key: 2: Value: {0:3.0,1:5.0,2:6.0}
>>>>>>> Count: 3
>>>>>>>
>>>>>>>
>>>>>>> For B: Key class: class org.apache.hadoop.io.IntWritable
>>>>>>> Value Class: class org.apache.mahout.math.VectorWritable
>>>>>>> Key: 0: Value: {0:5.0,1:2.0,2:4.0} Key: 1: Value:
>>>>>>> {0:-3.0,1:6.0,2:2.0} Key: 2: Value: {0:3.0,1:-3.0,2:1.0}
>>>>>>> Count: 3
>>>>>>>
>>>>>>>
>>>>>>> And finally, the correct eigenvalues should be: For A: λ1
>>>>>>> = 11.3448 λ2 = -0.515729 λ3 = 0.170915
>>>>>>>
>>>>>>> For B: λ1 = 7 λ2 = 3 λ3 = 2
>>>>>>>
>>>>>>> So, are there any known bugs in Mahout's SVD
>>>>>>> implementation? Am I doing something wrong? Is this
>>>>>>> algorithm known to produce wrong results?
>>>>>>>
>>>>>>> Thanks in advance.
>>>>>>>
>>>>>>> Markus
>>>>>
>>>>>
>>>>
>>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.10 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
>
> iEYEARECAAYFAk6DhEIACgkQA8JzLzUe2LNSHwCgpc/ZgUXPaq0aNwrbcPGH4AXB
> MVgAnjrgbceGHNHcHheCPPGydoAvcr57
> =DBHE
> -----END PGP SIGNATURE-----

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message