spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xianyang Liu <xyliu0...@icloud.com>
Subject Re: TallSkinnyQR
Date Sat, 07 Oct 2017 06:44:55 GMT
<div dir='auto'></div><div class="gmail_extra"><br><div class="gmail_quote">2017年10月7日
上午5:29,Iman Mohtashemi &lt;iman.mohtashemi@gmail.com&gt;写道:<br type="attribution"><blockquote
class="quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div
dir="ltr">Hi guys,<div>Here is another problem I encountered using the tallskinny
QR. I&#39;ve attached some clear documentation of the problem. I posted it on the forum
but I&#39;m not sure if it went through</div><div>Best regards,</div><div>Iman</div><br
/><div class="elided-text"><div dir="ltr">On Fri, Dec 30, 2016 at 9:22 AM Sean
Owen &lt;<a href="mailto:sowen&#64;cloudera.com">sowen&#64;cloudera.com</a>&gt;
wrote:<br /></div><blockquote style="margin:0 0 0 0.8ex;border-left:1px #ccc
solid;padding-left:1ex">There are no changes to Spark at all here. See my workaround below. <br
/><br /><div class="elided-text"><div dir="ltr">On Fri, Dec 30, 2016,
17:18 Iman Mohtashemi &lt;<a href="mailto:iman.mohtashemi&#64;gmail.com">iman.mohtashemi&#64;gmail.com</a>&gt;
wrote:<br /></div><blockquote style="margin:0 0 0 0.8ex;border-left:1px #ccc
solid;padding-left:1ex"><div dir="auto">Hi guys,<div dir="auto">Are your changes/bug
fixes reflected in the Spark 2.1 release?</div></div><div dir="auto"><div
dir="auto">Iman</div></div><div><br /><div>On Dec 2, 2016
3:03 PM, &#34;Iman Mohtashemi&#34; &lt;<a href="mailto:iman.mohtashemi&#64;gmail.com">iman.mohtashemi&#64;gmail.com</a>&gt;
wrote:<br /><blockquote style="margin:0 0 0 0.8ex;border-left:1px #ccc solid;padding-left:1ex"><div
dir="auto">Thanks again! This is very helpful!<div dir="auto">Best regards,</div><div
dir="auto">Iman</div></div><div><br /><div>On Dec 2, 2016
2:49 PM, &#34;Huamin Li&#34; &lt;<a href="mailto:3ericli&#64;gmail.com">3ericli&#64;gmail.com</a>&gt;
wrote:<br /><blockquote style="margin:0 0 0 0.8ex;border-left:1px #ccc solid;padding-left:1ex"><div
dir="ltr">Hi Iman,<div><br /></div><div>You can get my code from <a
href="https://github.com/hl475/svd/tree/testSVD">https://github.com/hl475/svd/tree/testSVD</a>.
In additional to fix the index issue for IndexedRowMatrix (<a href="https://issues.apache.org/jira/browse/SPARK-8614">https://issues.apache.org/jira/browse/SPARK-8614</a>),
I have made some the following changes as well:</div><div><br /></div><div>(1)
Add tallSkinnySVD and computeSVDbyGram to indexedRowMatrix.</div><div>(2) Add
shuffle.scala to <span style="font-size:12.8px">mllib/src/main/scala/org/apach</span><span
style="font-size:12.8px">e/spark/mllib/linalg/distribut</span><span style="font-size:12.8px">ed/
(you need this if you want to use tallSkinnySVD). There was a bug about shuffle method in
breeze, and I sent the pull request to <a href="https://github.com/scalanlp/breeze/pull/571">https://github.com/scalanlp/breeze/pull/571</a>.
However, the pull request has been merged to breeze 0.13, whereas the version of breeze for
current Spark is 0.12.</span></div><div>(3) Add partialSVD to BlockMatrix
which computes the randomized singular value decomposition of a given BlockMatrix.<span
style="font-size:12.8px"><br /></span></div><div><br /></div><div>The
new SVD methods (tallSkinnySVD, computeSVDbyGram, and partialSVD) are in beta version right
now. You are totally welcome to test it and share the feedback with me!</div><div><br
/></div><div>I implemented these codes for my summer intern project with Mark
Tygert, and we are currently testing the performance of the new codes.</div><div><br
/></div><div>Best,</div><div>Huamin</div></div><div><br
/><div>On Fri, Dec 2, 2016 at 2:07 PM, Iman Mohtashemi <span dir="ltr">&lt;<a
href="mailto:iman.mohtashemi&#64;gmail.com">iman.mohtashemi&#64;gmail.com</a>&gt;</span>
wrote:<br /><blockquote style="margin:0 0 0 0.8ex;border-left:1px #ccc solid;padding-left:1ex"><div
dir="ltr">Great thanks! Where can I get the latest with the bug fixes?<div>best regards,<br
/><div>Iman</div></div></div><div><div><br /><div><div
dir="ltr">On Fri, Dec 2, 2016 at 10:54 AM Huamin Li &lt;<a href="mailto:3ericli&#64;gmail.com">3ericli&#64;gmail.com</a>&gt;
wrote:<br /></div><blockquote style="margin:0 0 0 0.8ex;border-left:1px #ccc
solid;padding-left:1ex"><div dir="ltr"><div><span style="font-size:14px;color:rgb(
51 , 51 , 51 )">Hi,</span></div><div><span style="font-size:14px;color:rgb(
51 , 51 , 51 )"><br /></span></div><div><font color="#333333"><span
style="font-size:14px">There seems to be a bug in the section of code that converts the
RowMatrix format back into indexedRowMatrix format. </span></font></div><div><span
style="font-size:14px;color:rgb( 51 , 51 , 51 )"><br /></span></div><div><span
style="font-size:14px;color:rgb( 51 , 51 , 51 )">F</span><span style="color:rgb(
51 , 51 , 51 );font-size:14px">or RowMatrix, I think the singular values and right singular
vectors (not the left singular vectors U) that computeSVD computes are correct when using
multiple executors/machines; Only the R (not the Q) in tallSkinnyQR is correct when using
multiple executors/machines. U and Q were being stored in RowMatrix format. There is no index
information about RowMatrix, so it does not make sense for U and Q.</span><br /></div><div><span
style="color:rgb( 51 , 51 , 51 );font-size:14px"><br /></span></div><div><span
style="color:rgb( 51 , 51 , 51 );font-size:14px">Others have run into this same problem
(</span><font color="#333333"><span style="font-size:14px"><a href="https://issues.apache.org/jira/browse/SPARK-8614">https://issues.apache.org/jira/browse/SPARK-8614</a>)</span></font></div><div><br
/></div><div><font color="#333333"><span style="font-size:14px">I
think the quick solution for this problem is copy and paste the </span></font><span
style="color:rgb( 51 , 51 , 51 );font-size:14px">multiply, computeSVD, and tallSkinnyQR </span><font
color="#333333"><span style="font-size:14px">code from RowMatrix to IndexedRowMatrix
and make the corresponding changes although this would result in code duplication.</span></font><span
style="color:rgb( 51 , 51 , 51 );font-size:14px"><br /></span></div><div><font
color="#333333"><span style="font-size:14px"><br /></span></font></div><div><font
color="#333333"><span style="font-size:14px">I have fixed the problem by what I mentioned
above. Now, multiply, computeSVD, and tallSkinnyQR are giving the correct results for indexedRowMatrix
w</span></font><span style="color:rgb( 51 , 51 , 51 );font-size:14px">hen
using multiple executors or workers. Let me know if I should do a pull request for this.</span></div><div><span
style="color:rgb( 51 , 51 , 51 );font-size:14px"><br /></span></div><div><span
style="color:rgb( 51 , 51 , 51 );font-size:14px">Best,</span></div><div><span
style="color:rgb( 51 , 51 , 51 );font-size:14px">Huamin</span></div></div><div><br
/><div>On Fri, Dec 2, 2016 at 11:23 AM, Iman Mohtashemi <span dir="ltr">&lt;<a
href="mailto:iman.mohtashemi&#64;gmail.com">iman.mohtashemi&#64;gmail.com</a>&gt;</span>
wrote:<br /><blockquote style="margin:0 0 0 0.8ex;border-left:1px #ccc solid;padding-left:1ex"><div
dir="ltr">Ok thanks. </div><div><div><br /><div><div
dir="ltr">On Fri, Dec 2, 2016 at 8:19 AM Sean Owen &lt;<a href="mailto:sowen&#64;cloudera.com">sowen&#64;cloudera.com</a>&gt;
wrote:<br /></div><blockquote style="margin:0 0 0 0.8ex;border-left:1px #ccc
solid;padding-left:1ex"><div dir="ltr">I tried, but enforcing the ordering changed
a fair bit of behavior and I gave up. I think the way to think of it is: a RowMatrix has whatever
ordering you made it with, so you need to give it ordered rows if you&#39;re going to
use a method like the QR decomposition. That works. I don&#39;t think the QR method should
ever have been on this class though, for this reason.</div><br /><div><div
dir="ltr">On Fri, Dec 2, 2016 at 4:13 PM Iman Mohtashemi &lt;<a href="mailto:iman.mohtashemi&#64;gmail.com">iman.mohtashemi&#64;gmail.com</a>&gt;
wrote:<br /></div><blockquote style="margin:0 0 0 0.8ex;border-left:1px #ccc
solid;padding-left:1ex"><div dir="ltr">Hi guys,<div>Was this bug ever resolved?</div></div><div
dir="ltr"><div>Iman</div></div><br /><div><div dir="ltr">On
Fri, Nov 11, 2016 at 9:59 AM Iman Mohtashemi &lt;<a href="mailto:iman.mohtashemi&#64;gmail.com">iman.mohtashemi&#64;gmail.com</a>&gt;
wrote:<br /></div><blockquote style="margin:0 0 0 0.8ex;border-left:1px #ccc
solid;padding-left:1ex"><div dir="ltr">Yes this would be helpful, otherwise the Q
part of the decomposition is useless. One can use that to solve the system by transposing
it and multiplying with b and solving for x  (Ax &#61; b) where A &#61; R and b &#61;
Qt*b since the Upper triangular matrix is correctly available (R)</div><br /><div><div
dir="ltr">On Fri, Nov 11, 2016 at 3:56 AM Sean Owen &lt;<a href="mailto:sowen&#64;cloudera.com">sowen&#64;cloudera.com</a>&gt;
wrote:<br /></div><blockquote style="margin:0 0 0 0.8ex;border-left:1px #ccc
solid;padding-left:1ex"><div dir="ltr">&#64;Xiangrui / &#64;Joseph, do you
think it would be reasonable to have CoordinateMatrix sort the rows it creates to make an
IndexedRowMatrix? in order to make the ultimate output of toRowMatrix less surprising when
it&#39;s not ordered?</div><div dir="ltr"><br /><br /><div><div
dir="ltr">On Tue, Nov 8, 2016 at 3:29 PM Sean Owen &lt;<a href="mailto:sowen&#64;cloudera.com">sowen&#64;cloudera.com</a>&gt;
wrote:<br /></div><blockquote style="margin:0 0 0 0.8ex;border-left:1px #ccc
solid;padding-left:1ex"><div dir="ltr"><div>I think the problem here is that
IndexedRowMatrix.toRowMatrix does *not* result in a RowMatrix with rows in order of their
indices, necessarily:</div></div><div dir="ltr"><div><br /><br
/>// Drop its row indices. <br />                RowMatrix rowMat &#61; indexedRowMatrix.toRowMatrix();
<br /><br /></div></div><div dir="ltr"><div>What you get
is a matrix where the rows are arranged in whatever order they were passed to IndexedRowMatrix.
RowMatrix says it&#39;s for rows where the ordering doesn&#39;t matter, but then it&#39;s
maybe surprising it has a QR decomposition method, because clearly the result depends on the
order of rows in the input. (CC Yuhao Yang for a comment?)</div><div><br /></div><div>You
could say, well, why doesn&#39;t IndexedRowMatrix.toRowMatrix return at least something
with sorted rows? that would not be hard. It also won&#39;t return &#34;missing&#34;
rows (all zeroes), so it would not in any event result in a RowMatrix whose implicit rows
and ordering represented the same matrix. That, at least, strikes me as something to be better
documented. </div><div><br /></div><div>Maybe it would be nicer
still to at least sort the rows, given the existence of use cases like yours. For example,
at least CoordinateMatrix.toIndexedRowMatrix could sort? that is less surprising.</div><div><br
/></div><div>In any event you should be able to make it work by manually getting
the RDD[IndexedRow] out of IndexedRowMatrix, sorting by index, then mapping it to Vectors
and making a RowMatrix from it.</div></div><div dir="ltr"><div><br
/></div><div><br /><br /><div><div dir="ltr">On Tue,
Nov 8, 2016 at 2:41 PM Iman Mohtashemi &lt;<a href="mailto:iman.mohtashemi&#64;gmail.com">iman.mohtashemi&#64;gmail.com</a>&gt;
wrote:<br /></div><blockquote style="margin:0 0 0 0.8ex;border-left:1px #ccc
solid;padding-left:1ex"><div dir="ltr">Hi Sean,<div>Here you go:</div><div><br
/></div><div>sparsematrix.txt &#61; </div><div><br /></div><div>row,
col ,val</div><div><div>0,0,.42</div><div>0,1,.28</div><div>0,2,.89</div><div>1,0,.83</div><div>1,1,.34</div><div>1,2,.42</div><div>2,0,.23</div><div>3,0,.42</div><div>3,1,.98</div><div>3,2,.88</div><div>4,0,.23</div><div>4,1,.36</div><div>4,2,.97</div></div><div><br
/></div><div>The vector is just the third column of the matrix which should
give the trivial solution of [0,0,1]</div><div><br /></div><div>This
translates to this which is correct</div><div>There are zeros in the matrix (Not
really sparse but just an example)</div><div><div>0.42  0.28  0.89  </div><div>0.83
 0.34  0.42  </div><div>0.23  0.0   0.0   </div><div>0.42  0.98
 0.88  </div><div>0.23  0.36  0.97  </div></div><div><br
/></div><div><br /></div><div>Here is what I get for  the
Q and R</div><div><br /></div><div><div>Q: -0.21470961288429483
 0.23590615093828807   0.6784910613691661    </div><div>-0.3920784235278427
  -0.06171221388256143  0.5847874866876442    </div><div>-0.7748216464954987
  -0.4003560542230838   -0.29392323671555354  </div><div>-0.3920784235278427
  0.8517909521421976    -0.31435038559403217  </div><div>-0.21470961288429483
 -0.23389547730301666  -0.11165321782745863  </div><div>R: -1.0712142642814275
 -0.8347536340918976  -1.227672225670157  </div><div>0.0            
     0.7662808691141717   0.7553315911660984  </div><div>0.0        
         0.0                  0.7785210939368136  </div></div><div><br
/></div><div>When running this in matlab the numbers are the same but row 1
is the last row and the last row is interchanged with row 3</div><div><br /></div><div><br
/></div></div><br /><div><div dir="ltr">On Mon, Nov 7, 2016
at 11:35 PM Sean Owen &lt;<a href="mailto:sowen&#64;cloudera.com">sowen&#64;cloudera.com</a>&gt;
wrote:<br /></div><blockquote style="margin:0 0 0 0.8ex;border-left:1px #ccc
solid;padding-left:1ex"><div dir="ltr">Rather than post a large section of code,
please post a small example of the input matrix and its decomposition, to illustrate what
you&#39;re saying is out of order.<br /><br /><div><div dir="ltr">On
Tue, Nov 8, 2016 at 3:50 AM im281 &lt;<a href="mailto:iman.mohtashemi&#64;gmail.com">iman.mohtashemi&#64;gmail.com</a>&gt;
wrote:<br /></div><blockquote style="margin:0 0 0 0.8ex;border-left:1px #ccc
solid;padding-left:1ex">I am getting the correct rows but they are out of order. Is this
a bug or am<br />
I doing something wrong?<br />
<br /><br />
</blockquote></div></div>
</blockquote></div>
</blockquote></div></div></div></blockquote></div></div></blockquote></div></blockquote></div>
</blockquote></div>
</blockquote></div>
</div></div></blockquote></div><br /></div>
</blockquote></div>
</div></div></blockquote></div><br /></div>
</blockquote></div></div>
</blockquote></div></div>
</blockquote></div>
</blockquote></div></div>
</blockquote></div><br></div>
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

Mime
View raw message