mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gökhan Çapan <gkhn...@gmail.com>
Subject Re: Taste-GenericItemBasedRecommender
Date Fri, 04 Dec 2009 08:57:10 GMT
ith row of the matrix A'A contains all items and their similarity degrees to
the item that is represented at ith column of the matrix A.
I guess it is enough using only a subset of A'A at the final step, that is,
the rows which represent the items that are in active user's history.
btw, I also want to contribute to that implementation, if we can decide the
algorithm.

On Fri, Dec 4, 2009 at 10:33 AM, Sean Owen <srowen@gmail.com> wrote:

> Yes, this makes sense. I do need two passes. One pass converts input
> from "user,item,rating" triples into user vectors. Then the second
> step builds the co-occurrence A'A product. I agree that it will be
> faster to take a shortcut than properly compute A'A.
>
> (Though I'm curious how this works -- looks deceptively easy, this
> outer product approach. Isn't v cross v potentially huge? or likely to
> be sparse enough to not matter)
>
> I understand the final step in principle, which is to compute (A'A)h.
> But I keep guessing A'A is too big to fit in memory? So I can
> side-load the rows of A'A one at a time and compute it rather
> manually.
>
>
> On Thu, Dec 3, 2009 at 8:28 PM, Ted Dunning <ted.dunning@gmail.com> wrote:
> > I think you can merge my passes into a single pass in which you compute
> the
> > row and column sums at the same time that you compute the product.  That
> is
> > more complicated, though, and I hate fancy code.  So you are right in
> > practice that I have always had two passes.  (although pig might be
> clever
> > enough by now to merge them)
> >
> > There is another pass in which you use all of the sums to do the
> > sparsification.  I don't know if that could be done in the same pass or
> not.
>



-- 
Gökhan Çapan

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message