mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <>
Subject Re: Fwd: A MapReduce Algorithm for Matrix Multiplication
Date Wed, 09 Dec 2009 01:56:35 GMT
NMF should be amenable to the stochastic decomposition trick.  That reduces
the problem to a much smaller factorization that you could plausibly do
using sequential techniques.  Jake Manning is working on getting
factorizations going, but I don't know if he has gotten to NMF.

Jake's comment about doing the multiplication cleanly using sparse
techniques is important, but you should also remember that if you have a
sparse matrix stored in triple form, doing a transpose is simply a matter of
changing which index your consider a row and which a column.  For some
algorithms, it might pay to sort the result once so that you can do a
sort-join for the multiplication, but that isn't hard.

On Tue, Dec 8, 2009 at 5:43 PM, Zhengguo 'Mike' SUN

> Hi Ted, John,
> I am also interested in matrix multiplication. Actually, I am implementing
> Non-negative Matrix Factorization, which basically is iterations of matrix
> multiplication. What I did is exactly as what Ted said: doing an outer
> product using one column from left matrix and one row from right matrix and
> summing up all the results. Idealy, this could be done using only one job:
> mappers doing multiplication and reducers doing summing. But the thing is
> that how do you match a column from left matrix and a row from right matrix.
> Assuming that two input matrices are give in column major order and row
> major order, it seems to me that you have to implement a customized
> InputFormat to do the matching. Or maybe you could use one matrix as input
> and read the other inside mappers, but this also has problem when you have a
> lot of mappers. The straightforward way seems to be using two jobs. The
> mappers of the first job emit the id and the content of the row or column it
> reads and the reducers do the multiplication. The second job is to do the
> summing. This is kind of similar to what John did. But it is inefficient if
> you have many iterations. Any comments?

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message