ith row of the matrix A'A contains all items and their similarity degrees to
the item that is represented at ith column of the matrix A.
I guess it is enough using only a subset of A'A at the final step, that is,
the rows which represent the items that are in active user's history.
btw, I also want to contribute to that implementation, if we can decide the
algorithm.
On Fri, Dec 4, 2009 at 10:33 AM, Sean Owen <srowen@gmail.com> wrote:
> Yes, this makes sense. I do need two passes. One pass converts input
> from "user,item,rating" triples into user vectors. Then the second
> step builds the cooccurrence A'A product. I agree that it will be
> faster to take a shortcut than properly compute A'A.
>
> (Though I'm curious how this works  looks deceptively easy, this
> outer product approach. Isn't v cross v potentially huge? or likely to
> be sparse enough to not matter)
>
> I understand the final step in principle, which is to compute (A'A)h.
> But I keep guessing A'A is too big to fit in memory? So I can
> sideload the rows of A'A one at a time and compute it rather
> manually.
>
>
> On Thu, Dec 3, 2009 at 8:28 PM, Ted Dunning <ted.dunning@gmail.com> wrote:
> > I think you can merge my passes into a single pass in which you compute
> the
> > row and column sums at the same time that you compute the product. That
> is
> > more complicated, though, and I hate fancy code. So you are right in
> > practice that I have always had two passes. (although pig might be
> clever
> > enough by now to merge them)
> >
> > There is another pass in which you use all of the sums to do the
> > sparsification. I don't know if that could be done in the same pass or
> not.
>

Gökhan Çapan
