From Sean Owen <>
Subject Re: Taste-GenericItemBasedRecommender
Date Fri, 04 Dec 2009 08:33:19 GMT
Yes, this makes sense. I do need two passes. One pass converts input
from "user,item,rating" triples into user vectors. Then the second
step builds the co-occurrence A'A product. I agree that it will be
faster to take a shortcut than properly compute A'A.

(Though I'm curious how this works -- looks deceptively easy, this
outer product approach. Isn't v cross v potentially huge? or likely to
be sparse enough to not matter)

I understand the final step in principle, which is to compute (A'A)h.
But I keep guessing A'A is too big to fit in memory? So I can
side-load the rows of A'A one at a time and compute it rather

On Thu, Dec 3, 2009 at 8:28 PM, Ted Dunning <> wrote:
> I think you can merge my passes into a single pass in which you compute the
> row and column sums at the same time that you compute the product.  That is
> more complicated, though, and I hate fancy code.  So you are right in
> practice that I have always had two passes.  (although pig might be clever
> enough by now to merge them)
> There is another pass in which you use all of the sums to do the
> sparsification.  I don't know if that could be done in the same pass or not.

