mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zhengguo 'Mike' SUN <zhengguo...@yahoo.com>
Subject Re: Fwd: A MapReduce Algorithm for Matrix Multiplication
Date Wed, 09 Dec 2009 02:47:10 GMT
Hi Jake,

I am implementing the classical multiplicative update rule of NMF. The matrix to be factorized
is really big and sparse. Are you suggesting that I can use some specialised algorithms for
sparse matrix instead of the standard multiplication algorithm? But what algorithms are you
referring to? Could you please provide some pointers?

Thanks




________________________________
From: Jake Mannix <jake.mannix@gmail.com>
To: mahout-user@lucene.apache.org
Sent: Tue, December 8, 2009 8:53:39 PM
Subject: Re: Fwd: A MapReduce Algorithm for Matrix Multiplication

Hi Mike,

  In the NMF work you're doing, the matrices you're multiplying are very
thin, right?  They're a handful of dense vectors on the left, an equal
number of dense vectors on the right, and you're only computing the values
in the outer product which coincide with the nonzero entries of the (sparse)
input matrix?  In which case you're not really doing the full matrix
multiplication, and doing this a slightly specialized way should provide
some serious speedup.

  Unless you're not dealing with the "really big, but sparse" case, and
minimizing error only over the nonzero inputs?

  -jake

On Tue, Dec 8, 2009 at 5:43 PM, Zhengguo 'Mike' SUN
<zhengguosun@yahoo.com>wrote:

> Hi Ted, John,
>
> I am also interested in matrix multiplication. Actually, I am implementing
> Non-negative Matrix Factorization, which basically is iterations of matrix
> multiplication. What I did is exactly as what Ted said: doing an outer
> product using one column from left matrix and one row from right matrix and
> summing up all the results. Idealy, this could be done using only one job:
> mappers doing multiplication and reducers doing summing. But the thing is
> that how do you match a column from left matrix and a row from right matrix.
> Assuming that two input matrices are give in column major order and row
> major order, it seems to me that you have to implement a customized
> InputFormat to do the matching. Or maybe you could use one matrix as input
> and read the other inside mappers, but this also has problem when you have a
> lot of mappers. The straightforward way seems to be using two jobs. The
> mappers of the first job emit the id and the content of the row or column it
>  reads and the reducers do the multiplication. The second job is to do the
> summing. This is kind of similar to what John did. But it is inefficient if
> you have many iterations. Any comments?
>
>
>
>
> ________________________________
> From: Ted Dunning <ted.dunning@gmail.com>
> To: j-norstad@northwestern.edu; mahout-user <mahout-user@lucene.apache.org
> >
> Sent: Tue, December 8, 2009 2:59:06 PM
> Subject: Fwd: A MapReduce Algorithm for Matrix Multiplication
>
> John,
>
> Your comment about matrix multiplication was forwarded to the mahout-user
> emailing list.
>
> I have a question for you about your approach.  Have you considered doing
> the multiplication in a single step by storing the the first matrix in
> column major order and the second in row major order?  The idea would be to
> do a block-wise outer product and use the combiner to sum elements ahead of
> the reducer.  This could be done by reading a (block) column from the left
> matrix and a (block) row from the second and emitting all of the (block)
> elements of the outer product of this column and row.  The key emitted with
> these blocks would be the i,j index of the final location of each block in
> the final product.  The combiner and reducer could add all of the blocks
> together and the use of the combiner would serve to substantially minimize
> the amount of network traffic.
>
> This alternative is equivalent to inverting the loop nesting in a
> conventional matrix multiplication algorithm.  It is also very closely
> related to the way that cooccurrence counting is typically done in Hadoop.
>
> Is your project an ongoing effort?
>
> ---------- Forwarded message ----------
> From: Robin Anil <robin.anil@gmail.com>
> Date: Tue, Dec 8, 2009 at 11:09 AM
> Subject: Fwd: A MapReduce Algorithm for Matrix Multiplication
> To: mahout-user@lucene.apache.org
> Cc: John Norstad <j-norstad@northwestern.edu>
>
>
> ---------- Forwarded message ----------
> From: John Norstad <j-norstad@northwestern.edu>
> Date: Tue, Dec 8, 2009 at 10:14 PM
> Subject: A MapReduce Algorithm for Matrix Multiplication
> To: MapReduce <mapreduce-user@hadoop.apache.org>
>
>
> As an exercise while learning MapReduce, I developed an algorithm for
> matrix multiplication and wrote it up on my web site. If you're
> interested, it's at:
>
>  http://homepage.mac.com/j.norstad/matrix-multiply
>
> I present the algorithm in English, as pseudo-code, and as Java source
> code for Hadoop.
>
> --
> John Norstad
> Academic and Research Technologies
> Northwestern University
>
>
>
> --
> Ted Dunning, CTO
> DeepDyve
>
>
>
>
>



      
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message