mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <sro...@gmail.com>
Subject Re: Taste-GenericItemBasedRecommender
Date Thu, 03 Dec 2009 15:02:33 GMT
I'm going ahead with implementing something like this as a simple but
real distributed implementation.

One question that maybe Ted already knows the answer to: how would I
iterate over A_u* for each A_u*? the mapper would touch only one each.
I guess I could use a whole MR step to generate the cartesian product
so to speak.

On Thu, Sep 10, 2009 at 1:44 AM, Ted Dunning <ted.dunning@gmail.com> wrote:
> Another implementation is to iterate over users emitting cooccurrence
> pairs.  These can be sorted on the items involved and the counts
> aggregated.  This is ideal for map reduce implementation.
>
> Roughly the matrix approach becomes this:
>
>  Map:
>    input record is A_u*, all items for a particular user u
>    for each item i in A_u*
>       yield ((
>       yield ((i,*), 1)
>       for each item j in A_u*
>          yield ((i, j), 1)
>
>   Reduce:
>      input record is (i,j), (k1, k2, ... kn)
>      yield (i,j), sum k*
>
>          or
>
>       input record is (i,*), (k1, k2, ... ,kn)
>       yield (i,*) sum k*
>
> This computes the elements of A'A (item cooccurrence counts) and the row
> sums of A'A in one step.   You might want to compute the column sums of A
> first to use these for weighting A before using this program.
>
> Then you need another MR step to bring together the overall total, the row
> counts and the cell counts for final reduction.
>
> This is inside out compared with the most obvious implementation and there
> is no place where we really have two entire item vectors.  This will make it
> rather harder to have a totally simple pluggable weighting scheme that runs
> as fast.

Mime
View raw message