spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chunnan Yao <yaochun...@gmail.com>
Subject Re: How can I implement eigenvalue decomposition in Spark?
Date Fri, 08 Aug 2014 07:07:26 GMT
I think the eigenvalues and eigenvectors you are talking about is that of
M^T*M or M*M^T, if we get M=U*s*V^T as SVD. What I want is to get
eigenvectors and eigenvalues of M itself. Is this my misunderstanding of
linear algebra or the API?

[image: M^{*} M = V \Sigma^{*} U^{*}\, U \Sigma V^{*} = V (\Sigma^{*}
\Sigma) V^{*}\,][image: M M^{*} = U \Sigma V^{*} \, V \Sigma^{*} U^{*} = U
(\Sigma \Sigma^{*}) U^{*}\,]



2014-08-08 11:19 GMT+08:00 x <wasedaxiao@gmail.com>:

> U.rows.toArray.take(1).foreach(println) and V.toArray.take(s.size).foreach(println)
> are not eigenvectors corresponding to the biggest eigenvalue
> s.toArray(0)*s.toArray(0)?
>
> xj @ Tokyo
>
>
> On Fri, Aug 8, 2014 at 12:07 PM, Chunnan Yao <yaochunnan@gmail.com> wrote:
>
>> Hi there, what you've suggested are all meaningful. But to make myself
>> clearer, my essential problems are:
>> 1. My matrix is asymmetric, and it is a probabilistic adjacency matrix,
>> whose entries(a_ij) represents the likelihood that user j will broadcast
>> the information generated by user i. Apparently, a_ij and a_ji is
>> different, caus I love you doesn't necessarily mean you love me(What a sad
>> story~). All entries are real.
>> 2. I know I can get eigenvalues through SVD. My problem is I can't get
>> the corresponding eigenvectors, which requires solving equations, and I
>> also need eigenvectors in my calculation.In my simulation of this paper, I
>> only need the biggest eigenvalues and corresponding eigenvectors.
>> The paper posted by Shivaram Venkataraman is also concerned about
>> symmetric matrix. Could any one help me out?
>>
>>
>> 2014-08-08 9:41 GMT+08:00 x <wasedaxiao@gmail.com>:
>>
>>  The SVD computed result already contains descending order of singular
>>> values, you can get the biggest eigenvalue.
>>>
>>> ---
>>>
>>>   val svd = matrix.computeSVD(matrix.numCols().toInt, computeU = true)
>>>   val U: RowMatrix = svd.U
>>>   val s: Vector = svd.s
>>>   val V: Matrix = svd.V
>>>
>>>   U.rows.toArray.take(1).foreach(println)
>>>
>>>   println(s.toArray(0)*s.toArray(0))
>>>
>>>   println(V.toArray.take(s.size).foreach(println))
>>>
>>> ---
>>>
>>> xj @ Tokyo
>>>
>>>
>>> On Fri, Aug 8, 2014 at 3:06 AM, Shivaram Venkataraman <
>>> shivaram@eecs.berkeley.edu> wrote:
>>>
>>>> If you just want to find the top eigenvalue / eigenvector you can do
>>>> something like the Lanczos method. There is a description of a MapReduce
>>>> based algorithm in Section 4.2 of [1]
>>>>
>>>> [1] http://www.cs.cmu.edu/~ukang/papers/HeigenPAKDD2011.pdf
>>>>
>>>>
>>>> On Thu, Aug 7, 2014 at 10:54 AM, Li Pu <lpu@twitter.com.invalid> wrote:
>>>>
>>>>> @Miles, the latest SVD implementation in mllib is partially
>>>>> distributed. Matrix-vector multiplication is computed among all workers,
>>>>> but the right singular vectors are all stored in the driver. If your
>>>>> symmetric matrix is n x n and you want the first k eigenvalues, you will
>>>>> need to fit n x k doubles in driver's memory. Behind the scene, it calls
>>>>> ARPACK to compute eigen-decomposition of A^T A. You can look into the
>>>>> source code for the details.
>>>>>
>>>>> @Sean, the SVD++ implementation in graphx is not the canonical
>>>>> definition of SVD. It doesn't have the orthogonality that SVD holds.
But we
>>>>> might want to use graphx as the underlying matrix representation for
>>>>> mllib.SVD to address the problem of skewed entry distribution.
>>>>>
>>>>>
>>>>> On Thu, Aug 7, 2014 at 10:51 AM, Evan R. Sparks <evan.sparks@gmail.com
>>>>> > wrote:
>>>>>
>>>>>> Reza Zadeh has contributed the distributed implementation of
>>>>>> (Tall/Skinny) SVD (
>>>>>> http://spark.apache.org/docs/latest/mllib-dimensionality-reduction.html),
>>>>>> which is in MLlib (Spark 1.0) and a distributed sparse SVD coming
in Spark
>>>>>> 1.1. (https://issues.apache.org/jira/browse/SPARK-1782). If your
>>>>>> data is sparse (which it often is in social networks), you may have
better
>>>>>> luck with this.
>>>>>>
>>>>>> I haven't tried the GraphX implementation, but those algorithms are
>>>>>> often well-suited for power-law distributed graphs as you might see
in
>>>>>> social networks.
>>>>>>
>>>>>> FWIW, I believe you need to square elements of the sigma matrix from
>>>>>> the SVD to get the eigenvalues.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Aug 7, 2014 at 10:20 AM, Sean Owen <sowen@cloudera.com>
>>>>>> wrote:
>>>>>>
>>>>>>> (-incubator, +user)
>>>>>>>
>>>>>>> If your matrix is symmetric (and real I presume), and if my linear
>>>>>>> algebra isn't too rusty, then its SVD is its eigendecomposition.
The
>>>>>>> SingularValueDecomposition object you get back has U and V, both
of
>>>>>>> which have columns that are the eigenvectors.
>>>>>>>
>>>>>>> There are a few SVDs in the Spark code. The one in mllib is not
>>>>>>> distributed (right?) and is probably not an efficient means of
>>>>>>> computing eigenvectors if you really just want a decomposition
of a
>>>>>>> symmetric matrix.
>>>>>>>
>>>>>>> The one I see in graphx is distributed? I haven't used it though.
>>>>>>> Maybe it could be part of a solution.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Aug 7, 2014 at 2:21 PM, yaochunnan <yaochunnan@gmail.com>
>>>>>>> wrote:
>>>>>>> > Our lab need to do some simulation on online social networks.
We
>>>>>>> need to
>>>>>>> > handle a 5000*5000 adjacency matrix, namely, to get its
largest
>>>>>>> eigenvalue
>>>>>>> > and corresponding eigenvector. Matlab can be used but it
is
>>>>>>> time-consuming.
>>>>>>> > Is Spark effective in linear algebra calculations and
>>>>>>> transformations? Later
>>>>>>> > we would have 5000000*5000000 matrix processed. It seems
emergent
>>>>>>> that we
>>>>>>> > should find some distributed computation platform.
>>>>>>> >
>>>>>>> > I see SVD has been implemented and I can get eigenvalues
of a
>>>>>>> matrix through
>>>>>>> > this API.  But when I want to get both eigenvalues and
>>>>>>> eigenvectors or at
>>>>>>> > least the biggest eigenvalue and the corresponding eigenvector,
it
>>>>>>> seems
>>>>>>> > that current Spark doesn't have such API. Is it possible
that I
>>>>>>> write
>>>>>>> > eigenvalue decomposition from scratch? What should I do?
Thanks a
>>>>>>> lot!
>>>>>>> >
>>>>>>> >
>>>>>>> > Miles Yao
>>>>>>> >
>>>>>>> > ________________________________
>>>>>>> > View this message in context: How can I implement eigenvalue
>>>>>>> decomposition
>>>>>>> > in Spark?
>>>>>>> > Sent from the Apache Spark User List mailing list archive
at
>>>>>>> Nabble.com.
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>>>>>> For additional commands, e-mail: user-help@spark.apache.org
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Li
>>>>> @vrilleup
>>>>>
>>>>
>>>>
>>>
>>
>

Mime
View raw message