Hi guys,
Here is another problem I encountered using the tallskinny QR. I've
attached some clear documentation of the problem. I posted it on the forum
but I'm not sure if it went through
Best regards,
Iman
On Fri, Dec 30, 2016 at 9:22 AM Sean Owen <sowen@cloudera.com> wrote:
> There are no changes to Spark at all here. See my workaround below.
>
> On Fri, Dec 30, 2016, 17:18 Iman Mohtashemi <iman.mohtashemi@gmail.com>
> wrote:
>
>> Hi guys,
>> Are your changes/bug fixes reflected in the Spark 2.1 release?
>> Iman
>>
>> On Dec 2, 2016 3:03 PM, "Iman Mohtashemi" <iman.mohtashemi@gmail.com>
>> wrote:
>>
>> Thanks again! This is very helpful!
>> Best regards,
>> Iman
>>
>> On Dec 2, 2016 2:49 PM, "Huamin Li" <3ericli@gmail.com> wrote:
>>
>> Hi Iman,
>>
>> You can get my code from https://github.com/hl475/svd/tree/testSVD. In
>> additional to fix the index issue for IndexedRowMatrix (
>> https://issues.apache.org/jira/browse/SPARK8614), I have made some the
>> following changes as well:
>>
>> (1) Add tallSkinnySVD and computeSVDbyGram to indexedRowMatrix.
>> (2) Add shuffle.scala to mllib/src/main/scala/org/apach
>> e/spark/mllib/linalg/distributed/ (you need this if you want to use
>> tallSkinnySVD). There was a bug about shuffle method in breeze, and I sent
>> the pull request to https://github.com/scalanlp/breeze/pull/571.
>> However, the pull request has been merged to breeze 0.13, whereas the
>> version of breeze for current Spark is 0.12.
>> (3) Add partialSVD to BlockMatrix which computes the randomized singular
>> value decomposition of a given BlockMatrix.
>>
>> The new SVD methods (tallSkinnySVD, computeSVDbyGram, and partialSVD) are
>> in beta version right now. You are totally welcome to test it and share the
>> feedback with me!
>>
>> I implemented these codes for my summer intern project with Mark Tygert,
>> and we are currently testing the performance of the new codes.
>>
>> Best,
>> Huamin
>>
>> On Fri, Dec 2, 2016 at 2:07 PM, Iman Mohtashemi <
>> iman.mohtashemi@gmail.com> wrote:
>>
>> Great thanks! Where can I get the latest with the bug fixes?
>> best regards,
>> Iman
>>
>> On Fri, Dec 2, 2016 at 10:54 AM Huamin Li <3ericli@gmail.com> wrote:
>>
>> Hi,
>>
>> There seems to be a bug in the section of code that converts the
>> RowMatrix format back into indexedRowMatrix format.
>>
>> For RowMatrix, I think the singular values and right singular vectors
>> (not the left singular vectors U) that computeSVD computes are correct when
>> using multiple executors/machines; Only the R (not the Q) in tallSkinnyQR
>> is correct when using multiple executors/machines. U and Q were being
>> stored in RowMatrix format. There is no index information about RowMatrix,
>> so it does not make sense for U and Q.
>>
>> Others have run into this same problem (
>> https://issues.apache.org/jira/browse/SPARK8614)
>>
>> I think the quick solution for this problem is copy and paste the multiply,
>> computeSVD, and tallSkinnyQR code from RowMatrix to IndexedRowMatrix and
>> make the corresponding changes although this would result in code
>> duplication.
>>
>> I have fixed the problem by what I mentioned above. Now, multiply,
>> computeSVD, and tallSkinnyQR are giving the correct results for
>> indexedRowMatrix when using multiple executors or workers. Let me know
>> if I should do a pull request for this.
>>
>> Best,
>> Huamin
>>
>> On Fri, Dec 2, 2016 at 11:23 AM, Iman Mohtashemi <
>> iman.mohtashemi@gmail.com> wrote:
>>
>> Ok thanks.
>>
>> On Fri, Dec 2, 2016 at 8:19 AM Sean Owen <sowen@cloudera.com> wrote:
>>
>> I tried, but enforcing the ordering changed a fair bit of behavior and I
>> gave up. I think the way to think of it is: a RowMatrix has whatever
>> ordering you made it with, so you need to give it ordered rows if you're
>> going to use a method like the QR decomposition. That works. I don't think
>> the QR method should ever have been on this class though, for this reason.
>>
>> On Fri, Dec 2, 2016 at 4:13 PM Iman Mohtashemi <iman.mohtashemi@gmail.com>
>> wrote:
>>
>> Hi guys,
>> Was this bug ever resolved?
>> Iman
>>
>> On Fri, Nov 11, 2016 at 9:59 AM Iman Mohtashemi <
>> iman.mohtashemi@gmail.com> wrote:
>>
>> Yes this would be helpful, otherwise the Q part of the decomposition is
>> useless. One can use that to solve the system by transposing it and
>> multiplying with b and solving for x (Ax = b) where A = R and b = Qt*b
>> since the Upper triangular matrix is correctly available (R)
>>
>> On Fri, Nov 11, 2016 at 3:56 AM Sean Owen <sowen@cloudera.com> wrote:
>>
>> @Xiangrui / @Joseph, do you think it would be reasonable to have
>> CoordinateMatrix sort the rows it creates to make an IndexedRowMatrix? in
>> order to make the ultimate output of toRowMatrix less surprising when it's
>> not ordered?
>>
>>
>> On Tue, Nov 8, 2016 at 3:29 PM Sean Owen <sowen@cloudera.com> wrote:
>>
>> I think the problem here is that IndexedRowMatrix.toRowMatrix does *not*
>> result in a RowMatrix with rows in order of their indices, necessarily:
>>
>>
>> // Drop its row indices.
>> RowMatrix rowMat = indexedRowMatrix.toRowMatrix();
>>
>> What you get is a matrix where the rows are arranged in whatever order
>> they were passed to IndexedRowMatrix. RowMatrix says it's for rows where
>> the ordering doesn't matter, but then it's maybe surprising it has a QR
>> decomposition method, because clearly the result depends on the order of
>> rows in the input. (CC Yuhao Yang for a comment?)
>>
>> You could say, well, why doesn't IndexedRowMatrix.toRowMatrix return at
>> least something with sorted rows? that would not be hard. It also won't
>> return "missing" rows (all zeroes), so it would not in any event result in
>> a RowMatrix whose implicit rows and ordering represented the same matrix.
>> That, at least, strikes me as something to be better documented.
>>
>> Maybe it would be nicer still to at least sort the rows, given the
>> existence of use cases like yours. For example, at least
>> CoordinateMatrix.toIndexedRowMatrix could sort? that is less surprising.
>>
>> In any event you should be able to make it work by manually getting the
>> RDD[IndexedRow] out of IndexedRowMatrix, sorting by index, then mapping it
>> to Vectors and making a RowMatrix from it.
>>
>>
>>
>> On Tue, Nov 8, 2016 at 2:41 PM Iman Mohtashemi <iman.mohtashemi@gmail.com>
>> wrote:
>>
>> Hi Sean,
>> Here you go:
>>
>> sparsematrix.txt =
>>
>> row, col ,val
>> 0,0,.42
>> 0,1,.28
>> 0,2,.89
>> 1,0,.83
>> 1,1,.34
>> 1,2,.42
>> 2,0,.23
>> 3,0,.42
>> 3,1,.98
>> 3,2,.88
>> 4,0,.23
>> 4,1,.36
>> 4,2,.97
>>
>> The vector is just the third column of the matrix which should give the
>> trivial solution of [0,0,1]
>>
>> This translates to this which is correct
>> There are zeros in the matrix (Not really sparse but just an example)
>> 0.42 0.28 0.89
>> 0.83 0.34 0.42
>> 0.23 0.0 0.0
>> 0.42 0.98 0.88
>> 0.23 0.36 0.97
>>
>>
>> Here is what I get for the Q and R
>>
>> Q: 0.21470961288429483 0.23590615093828807 0.6784910613691661
>> 0.3920784235278427 0.06171221388256143 0.5847874866876442
>> 0.7748216464954987 0.4003560542230838 0.29392323671555354
>> 0.3920784235278427 0.8517909521421976 0.31435038559403217
>> 0.21470961288429483 0.23389547730301666 0.11165321782745863
>> R: 1.0712142642814275 0.8347536340918976 1.227672225670157
>> 0.0 0.7662808691141717 0.7553315911660984
>> 0.0 0.0 0.7785210939368136
>>
>> When running this in matlab the numbers are the same but row 1 is the
>> last row and the last row is interchanged with row 3
>>
>>
>>
>> On Mon, Nov 7, 2016 at 11:35 PM Sean Owen <sowen@cloudera.com> wrote:
>>
>> Rather than post a large section of code, please post a small example of
>> the input matrix and its decomposition, to illustrate what you're saying is
>> out of order.
>>
>> On Tue, Nov 8, 2016 at 3:50 AM im281 <iman.mohtashemi@gmail.com> wrote:
>>
>> I am getting the correct rows but they are out of order. Is this a bug or
>> am
>> I doing something wrong?
>>
>>
>>
>>
>>
