mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Fwd: A MapReduce Algorithm for Matrix Multiplication
Date Wed, 09 Dec 2009 18:05:18 GMT
See here:

http://arxiv.org/abs/0909.4061v1

On Tue, Dec 8, 2009 at 6:47 PM, Zhengguo 'Mike' SUN
<zhengguosun@yahoo.com>wrote:

> Hi Jake,
>
> I am implementing the classical multiplicative update rule of NMF. The
> matrix to be factorized is really big and sparse. Are you suggesting that I
> can use some specialised algorithms for sparse matrix instead of the
> standard multiplication algorithm? But what algorithms are you referring to?
> Could you please provide some pointers?
>
> Thanks
>
>
>
>
> ________________________________
> From: Jake Mannix <jake.mannix@gmail.com>
> To: mahout-user@lucene.apache.org
> Sent: Tue, December 8, 2009 8:53:39 PM
> Subject: Re: Fwd: A MapReduce Algorithm for Matrix Multiplication
>
> Hi Mike,
>
>   In the NMF work you're doing, the matrices you're multiplying are very
> thin, right?  They're a handful of dense vectors on the left, an equal
> number of dense vectors on the right, and you're only computing the values
> in the outer product which coincide with the nonzero entries of the
> (sparse)
> input matrix?  In which case you're not really doing the full matrix
> multiplication, and doing this a slightly specialized way should provide
> some serious speedup.
>
>   Unless you're not dealing with the "really big, but sparse" case, and
> minimizing error only over the nonzero inputs?
>
>   -jake
>
> On Tue, Dec 8, 2009 at 5:43 PM, Zhengguo 'Mike' SUN
> <zhengguosun@yahoo.com>wrote:
>
> > Hi Ted, John,
> >
> > I am also interested in matrix multiplication. Actually, I am
> implementing
> > Non-negative Matrix Factorization, which basically is iterations of
> matrix
> > multiplication. What I did is exactly as what Ted said: doing an outer
> > product using one column from left matrix and one row from right matrix
> and
> > summing up all the results. Idealy, this could be done using only one
> job:
> > mappers doing multiplication and reducers doing summing. But the thing is
> > that how do you match a column from left matrix and a row from right
> matrix.
> > Assuming that two input matrices are give in column major order and row
> > major order, it seems to me that you have to implement a customized
> > InputFormat to do the matching. Or maybe you could use one matrix as
> input
> > and read the other inside mappers, but this also has problem when you
> have a
> > lot of mappers. The straightforward way seems to be using two jobs. The
> > mappers of the first job emit the id and the content of the row or column
> it
> >  reads and the reducers do the multiplication. The second job is to do
> the
> > summing. This is kind of similar to what John did. But it is inefficient
> if
> > you have many iterations. Any comments?
> >
> >
> >
> >
> > ________________________________
> > From: Ted Dunning <ted.dunning@gmail.com>
> > To: j-norstad@northwestern.edu; mahout-user <
> mahout-user@lucene.apache.org
> > >
> > Sent: Tue, December 8, 2009 2:59:06 PM
> > Subject: Fwd: A MapReduce Algorithm for Matrix Multiplication
> >
> > John,
> >
> > Your comment about matrix multiplication was forwarded to the mahout-user
> > emailing list.
> >
> > I have a question for you about your approach.  Have you considered doing
> > the multiplication in a single step by storing the the first matrix in
> > column major order and the second in row major order?  The idea would be
> to
> > do a block-wise outer product and use the combiner to sum elements ahead
> of
> > the reducer.  This could be done by reading a (block) column from the
> left
> > matrix and a (block) row from the second and emitting all of the (block)
> > elements of the outer product of this column and row.  The key emitted
> with
> > these blocks would be the i,j index of the final location of each block
> in
> > the final product.  The combiner and reducer could add all of the blocks
> > together and the use of the combiner would serve to substantially
> minimize
> > the amount of network traffic.
> >
> > This alternative is equivalent to inverting the loop nesting in a
> > conventional matrix multiplication algorithm.  It is also very closely
> > related to the way that cooccurrence counting is typically done in
> Hadoop.
> >
> > Is your project an ongoing effort?
> >
> > ---------- Forwarded message ----------
> > From: Robin Anil <robin.anil@gmail.com>
> > Date: Tue, Dec 8, 2009 at 11:09 AM
> > Subject: Fwd: A MapReduce Algorithm for Matrix Multiplication
> > To: mahout-user@lucene.apache.org
> > Cc: John Norstad <j-norstad@northwestern.edu>
> >
> >
> > ---------- Forwarded message ----------
> > From: John Norstad <j-norstad@northwestern.edu>
> > Date: Tue, Dec 8, 2009 at 10:14 PM
> > Subject: A MapReduce Algorithm for Matrix Multiplication
> > To: MapReduce <mapreduce-user@hadoop.apache.org>
> >
> >
> > As an exercise while learning MapReduce, I developed an algorithm for
> > matrix multiplication and wrote it up on my web site. If you're
> > interested, it's at:
> >
> >  http://homepage.mac.com/j.norstad/matrix-multiply
> >
> > I present the algorithm in English, as pseudo-code, and as Java source
> > code for Hadoop.
> >
> > --
> > John Norstad
> > Academic and Research Technologies
> > Northwestern University
> >
> >
> >
> > --
> > Ted Dunning, CTO
> > DeepDyve
> >
> >
> >
> >
> >
>
>
>
>




-- 
Ted Dunning, CTO
DeepDyve

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message