spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Toni Verbeiren <t...@data-intuitive.com>
Subject Re: RowMatrix multiplication
Date Thu, 15 Jan 2015 08:49:19 GMT
You can always define an RDD transpose function yourself. This is what I use in PySpark to
transpose an RDD of numpy vectors. It’s not optimal and the vectors need to fit in memory
on the worker nodes.
def rddTranspose(rdd):
    # add an index to the rows and the columns, result in triplet
    dataT1 = data.zipWithIndex().flatMap(lambda (x,i): [(i,j,e) for (j,e) in enumerate(x)])
    # use the column from the original as key and group and sort
    dataT2 = dataT1.map(lambda (i,j,e): (j, (i,e)))\
                   .groupByKey().sortByKey()
    # Sort the lists inside the rows
    dataT3 = dataT2.map(lambda (i, x): sorted(list(x), cmp=lambda (i1,e1),(i2,e2): cmp(i1,
i2)))
    # Remove the indices inside the rows
    dataT4 = dataT3.map(lambda x: map(lambda (i, y): y , x))
    # convert to numpy arrays in the rows
    return dataT4.map(lambda x: np.asarray(x))

Cheers,
Toni

On 12 Jan 2015 at 20:45:58, Alex Minnaar (aminnaar@verticalscope.com) wrote:

That's not quite what I'm looking for.  Let me provide an example.  I have a rowmatrix A
that is nxm and I have two local matrices b and c.  b is mx1 and c is nx1.  In my spark
job I wish to perform the following two computations



A*b



and



A^T*c



I don't think this is possible without being able to transpose a rowmatrix.  Am I correct?



Thanks,



Alex

From: Reza Zadeh <reza@databricks.com>
Sent: Monday, January 12, 2015 1:58 PM
To: Alex Minnaar
Cc: user@spark.incubator.apache.org
Subject: Re: RowMatrix multiplication
 
As you mentioned, you can perform A * b, where A is a rowmatrix and b is a local matrix.

From your email, I figure you want to compute b * A^T. To do this, you can compute C = A b^T,
whose result is the transpose of what you were looking for, i.e. C^T = b * A^T. To undo the
transpose, you would have transpose C manually yourself. Be careful though, because the result
might not have each Row fit in memory on a single machine, which is what RowMatrix requires.
This danger is why we didn't provide a transpose operation in RowMatrix natively.

To address this and more, there is an effort to provide more comprehensive linear algebra
through block matrices, which will likely make it to 1.3:
https://issues.apache.org/jira/browse/SPARK-3434

Best,
Reza

On Mon, Jan 12, 2015 at 6:33 AM, Alex Minnaar <aminnaar@verticalscope.com> wrote:
I have a rowMatrix on which I want to perform two multiplications.  The first is a right
multiplication with a local matrix which is fine.  But after that I also wish to right multiply
the transpose of my rowMatrix with a different local matrix.  I understand that there is
no functionality to transpose a rowMatrix at this time but I was wondering if anyone could
suggest a any kind of work-around for this.  I had thought that I might be able to initially
create two rowMatrices - a normal version and a transposed version - and use either when appropriate. 
Can anyone think of another alternative?



Thanks,



Alex



Mime
View raw message