spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <so...@cloudera.com>
Subject Re: TallSkinnyQR
Date Tue, 08 Nov 2016 15:29:01 GMT
I think the problem here is that IndexedRowMatrix.toRowMatrix does *not*
result in a RowMatrix with rows in order of their indices, necessarily:

// Drop its row indices.
RowMatrix rowMat = indexedRowMatrix.toRowMatrix();

What you get is a matrix where the rows are arranged in whatever order they
were passed to IndexedRowMatrix. RowMatrix says it's for rows where the
ordering doesn't matter, but then it's maybe surprising it has a QR
decomposition method, because clearly the result depends on the order of
rows in the input. (CC Yuhao Yang for a comment?)

You could say, well, why doesn't IndexedRowMatrix.toRowMatrix return at
least something with sorted rows? that would not be hard. It also won't
return "missing" rows (all zeroes), so it would not in any event result in
a RowMatrix whose implicit rows and ordering represented the same matrix.
That, at least, strikes me as something to be better documented.

Maybe it would be nicer still to at least sort the rows, given the
existence of use cases like yours. For example, at least
CoordinateMatrix.toIndexedRowMatrix could sort? that is less surprising.

In any event you should be able to make it work by manually getting the
RDD[IndexedRow] out of IndexedRowMatrix, sorting by index, then mapping it
to Vectors and making a RowMatrix from it.



On Tue, Nov 8, 2016 at 2:41 PM Iman Mohtashemi <iman.mohtashemi@gmail.com>
wrote:

> Hi Sean,
> Here you go:
>
> sparsematrix.txt =
>
> row, col ,val
> 0,0,.42
> 0,1,.28
> 0,2,.89
> 1,0,.83
> 1,1,.34
> 1,2,.42
> 2,0,.23
> 3,0,.42
> 3,1,.98
> 3,2,.88
> 4,0,.23
> 4,1,.36
> 4,2,.97
>
> The vector is just the third column of the matrix which should give the
> trivial solution of [0,0,1]
>
> This translates to this which is correct
> There are zeros in the matrix (Not really sparse but just an example)
> 0.42  0.28  0.89
> 0.83  0.34  0.42
> 0.23  0.0   0.0
> 0.42  0.98  0.88
> 0.23  0.36  0.97
>
>
> Here is what I get for  the Q and R
>
> Q: -0.21470961288429483  0.23590615093828807   0.6784910613691661
> -0.3920784235278427   -0.06171221388256143  0.5847874866876442
> -0.7748216464954987   -0.4003560542230838   -0.29392323671555354
> -0.3920784235278427   0.8517909521421976    -0.31435038559403217
> -0.21470961288429483  -0.23389547730301666  -0.11165321782745863
> R: -1.0712142642814275  -0.8347536340918976  -1.227672225670157
> 0.0                  0.7662808691141717   0.7553315911660984
> 0.0                  0.0                  0.7785210939368136
>
> When running this in matlab the numbers are the same but row 1 is the last
> row and the last row is interchanged with row 3
>
>
>
> On Mon, Nov 7, 2016 at 11:35 PM Sean Owen <sowen@cloudera.com> wrote:
>
> Rather than post a large section of code, please post a small example of
> the input matrix and its decomposition, to illustrate what you're saying is
> out of order.
>
> On Tue, Nov 8, 2016 at 3:50 AM im281 <iman.mohtashemi@gmail.com> wrote:
>
> I am getting the correct rows but they are out of order. Is this a bug or
> am
> I doing something wrong?
>
>
>

Mime
View raw message