On Tue, Dec 8, 2009 at 6:49 PM, Zhengguo 'Mike' SUN
wrote:
> What stochastic decomposition trick are you guys referring to? I appreciate
> if you could provide some pointers.
>
http://arxiv.org/abs/0909.4061
Basically the trick is: first project your column space (or row space, but
let's think of the rows as your
data points) randomly down from gigantically huge down to some small, but
still larger than the final
dimension you want to reduce to (say: project from millions of columns down
to thousands, where you
want to end up with a rank-300 factorization). Then take your num_rows by
small_num matrix and square
it, leaving you with a small_num by small_num matrix you can decompose in
memory using whatever
techniques you know to work on small, but dense, matrices.
It works because of the Johnson-Lindenstrauss
lemma.
-jake
>
>
> ________________________________
> From: Jake Mannix
> To: mahout-user@lucene.apache.org
> Cc: j-norstad@northwestern.edu
> Sent: Tue, December 8, 2009 9:08:19 PM
> Subject: Re: Fwd: A MapReduce Algorithm for Matrix Multiplication
>
> On Tue, Dec 8, 2009 at 5:56 PM, Ted Dunning wrote:
>
> > NMF should be amenable to the stochastic decomposition trick. That
> reduces
> > the problem to a much smaller factorization that you could plausibly do
> > using sequential techniques. Jake Mannix is working on getting
> > factorizations going, but I don't know if he has gotten to NMF.
> >
>
> I'm not currently working on NMF, but the stochastic decomposition trick
> will be
> in there soon, which should allow all this pretty easily.
>
> Although... if you start with a positive matrix, you may want a specialized
> random
> projector which preserves positivity for this kind of thing. But I'm not
> sure, I
> haven't looked too closely at what happens when you try to do this trick on
> NMF.
>
> -jake
>
>
>
>
>