spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Iman Mohtashemi <iman.mohtash...@gmail.com>
Subject Re: TallSkinnyQR
Date Fri, 02 Dec 2016 23:03:35 GMT
Thanks again! This is very helpful!
Best regards,
Iman

On Dec 2, 2016 2:49 PM, "Huamin Li" <3ericli@gmail.com> wrote:

> Hi Iman,
>
> You can get my code from https://github.com/hl475/svd/tree/testSVD. In
> additional to fix the index issue for IndexedRowMatrix (
> https://issues.apache.org/jira/browse/SPARK-8614), I have made some the
> following changes as well:
>
> (1) Add tallSkinnySVD and computeSVDbyGram to indexedRowMatrix.
> (2) Add shuffle.scala to mllib/src/main/scala/org/apach
> e/spark/mllib/linalg/distributed/ (you need this if you want to use
> tallSkinnySVD). There was a bug about shuffle method in breeze, and I sent
> the pull request to https://github.com/scalanlp/breeze/pull/571. However,
> the pull request has been merged to breeze 0.13, whereas the version of
> breeze for current Spark is 0.12.
> (3) Add partialSVD to BlockMatrix which computes the randomized singular
> value decomposition of a given BlockMatrix.
>
> The new SVD methods (tallSkinnySVD, computeSVDbyGram, and partialSVD) are
> in beta version right now. You are totally welcome to test it and share the
> feedback with me!
>
> I implemented these codes for my summer intern project with Mark Tygert,
> and we are currently testing the performance of the new codes.
>
> Best,
> Huamin
>
> On Fri, Dec 2, 2016 at 2:07 PM, Iman Mohtashemi <iman.mohtashemi@gmail.com
> > wrote:
>
>> Great thanks! Where can I get the latest with the bug fixes?
>> best regards,
>> Iman
>>
>> On Fri, Dec 2, 2016 at 10:54 AM Huamin Li <3ericli@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> There seems to be a bug in the section of code that converts the
>>> RowMatrix format back into indexedRowMatrix format.
>>>
>>> For RowMatrix, I think the singular values and right singular vectors
>>> (not the left singular vectors U) that computeSVD computes are correct when
>>> using multiple executors/machines; Only the R (not the Q) in tallSkinnyQR
>>> is correct when using multiple executors/machines. U and Q were being
>>> stored in RowMatrix format. There is no index information about RowMatrix,
>>> so it does not make sense for U and Q.
>>>
>>> Others have run into this same problem (https://issues.apache.org/jir
>>> a/browse/SPARK-8614)
>>>
>>> I think the quick solution for this problem is copy and paste the multiply,
>>> computeSVD, and tallSkinnyQR code from RowMatrix to IndexedRowMatrix
>>> and make the corresponding changes although this would result in code
>>> duplication.
>>>
>>> I have fixed the problem by what I mentioned above. Now, multiply,
>>> computeSVD, and tallSkinnyQR are giving the correct results for
>>> indexedRowMatrix when using multiple executors or workers. Let me know
>>> if I should do a pull request for this.
>>>
>>> Best,
>>> Huamin
>>>
>>> On Fri, Dec 2, 2016 at 11:23 AM, Iman Mohtashemi <
>>> iman.mohtashemi@gmail.com> wrote:
>>>
>>> Ok thanks.
>>>
>>> On Fri, Dec 2, 2016 at 8:19 AM Sean Owen <sowen@cloudera.com> wrote:
>>>
>>> I tried, but enforcing the ordering changed a fair bit of behavior and I
>>> gave up. I think the way to think of it is: a RowMatrix has whatever
>>> ordering you made it with, so you need to give it ordered rows if you're
>>> going to use a method like the QR decomposition. That works. I don't think
>>> the QR method should ever have been on this class though, for this reason.
>>>
>>> On Fri, Dec 2, 2016 at 4:13 PM Iman Mohtashemi <
>>> iman.mohtashemi@gmail.com> wrote:
>>>
>>> Hi guys,
>>> Was this bug ever resolved?
>>> Iman
>>>
>>> On Fri, Nov 11, 2016 at 9:59 AM Iman Mohtashemi <
>>> iman.mohtashemi@gmail.com> wrote:
>>>
>>> Yes this would be helpful, otherwise the Q part of the decomposition is
>>> useless. One can use that to solve the system by transposing it and
>>> multiplying with b and solving for x  (Ax = b) where A = R and b = Qt*b
>>> since the Upper triangular matrix is correctly available (R)
>>>
>>> On Fri, Nov 11, 2016 at 3:56 AM Sean Owen <sowen@cloudera.com> wrote:
>>>
>>> @Xiangrui / @Joseph, do you think it would be reasonable to have
>>> CoordinateMatrix sort the rows it creates to make an IndexedRowMatrix? in
>>> order to make the ultimate output of toRowMatrix less surprising when it's
>>> not ordered?
>>>
>>>
>>> On Tue, Nov 8, 2016 at 3:29 PM Sean Owen <sowen@cloudera.com> wrote:
>>>
>>> I think the problem here is that IndexedRowMatrix.toRowMatrix does *not*
>>> result in a RowMatrix with rows in order of their indices, necessarily:
>>>
>>>
>>> // Drop its row indices.
>>> RowMatrix rowMat = indexedRowMatrix.toRowMatrix();
>>>
>>> What you get is a matrix where the rows are arranged in whatever order
>>> they were passed to IndexedRowMatrix. RowMatrix says it's for rows where
>>> the ordering doesn't matter, but then it's maybe surprising it has a QR
>>> decomposition method, because clearly the result depends on the order of
>>> rows in the input. (CC Yuhao Yang for a comment?)
>>>
>>> You could say, well, why doesn't IndexedRowMatrix.toRowMatrix return at
>>> least something with sorted rows? that would not be hard. It also won't
>>> return "missing" rows (all zeroes), so it would not in any event result in
>>> a RowMatrix whose implicit rows and ordering represented the same matrix.
>>> That, at least, strikes me as something to be better documented.
>>>
>>> Maybe it would be nicer still to at least sort the rows, given the
>>> existence of use cases like yours. For example, at least
>>> CoordinateMatrix.toIndexedRowMatrix could sort? that is less surprising.
>>>
>>> In any event you should be able to make it work by manually getting the
>>> RDD[IndexedRow] out of IndexedRowMatrix, sorting by index, then mapping it
>>> to Vectors and making a RowMatrix from it.
>>>
>>>
>>>
>>> On Tue, Nov 8, 2016 at 2:41 PM Iman Mohtashemi <
>>> iman.mohtashemi@gmail.com> wrote:
>>>
>>> Hi Sean,
>>> Here you go:
>>>
>>> sparsematrix.txt =
>>>
>>> row, col ,val
>>> 0,0,.42
>>> 0,1,.28
>>> 0,2,.89
>>> 1,0,.83
>>> 1,1,.34
>>> 1,2,.42
>>> 2,0,.23
>>> 3,0,.42
>>> 3,1,.98
>>> 3,2,.88
>>> 4,0,.23
>>> 4,1,.36
>>> 4,2,.97
>>>
>>> The vector is just the third column of the matrix which should give the
>>> trivial solution of [0,0,1]
>>>
>>> This translates to this which is correct
>>> There are zeros in the matrix (Not really sparse but just an example)
>>> 0.42  0.28  0.89
>>> 0.83  0.34  0.42
>>> 0.23  0.0   0.0
>>> 0.42  0.98  0.88
>>> 0.23  0.36  0.97
>>>
>>>
>>> Here is what I get for  the Q and R
>>>
>>> Q: -0.21470961288429483  0.23590615093828807   0.6784910613691661
>>> -0.3920784235278427   -0.06171221388256143  0.5847874866876442
>>> -0.7748216464954987   -0.4003560542230838   -0.29392323671555354
>>> -0.3920784235278427   0.8517909521421976    -0.31435038559403217
>>> -0.21470961288429483  -0.23389547730301666  -0.11165321782745863
>>> R: -1.0712142642814275  -0.8347536340918976  -1.227672225670157
>>> 0.0                  0.7662808691141717   0.7553315911660984
>>> 0.0                  0.0                  0.7785210939368136
>>>
>>> When running this in matlab the numbers are the same but row 1 is the
>>> last row and the last row is interchanged with row 3
>>>
>>>
>>>
>>> On Mon, Nov 7, 2016 at 11:35 PM Sean Owen <sowen@cloudera.com> wrote:
>>>
>>> Rather than post a large section of code, please post a small example of
>>> the input matrix and its decomposition, to illustrate what you're saying is
>>> out of order.
>>>
>>> On Tue, Nov 8, 2016 at 3:50 AM im281 <iman.mohtashemi@gmail.com> wrote:
>>>
>>> I am getting the correct rows but they are out of order. Is this a bug
>>> or am
>>> I doing something wrong?
>>>
>>>
>>>
>>>
>

Mime
View raw message