mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Mahout performance issues
Date Fri, 02 Dec 2011 18:10:39 GMT
This isn't really the problem.

Suppose that user A is connected to all items.

Suppose that all users are connected to item 1 even if no other.

Any recommendation will pull in A and thus all items become candidate
recommendations.  My suggestion is two-fold:  (1) eliminate item 1 entirely
and (2) down-sample the items that A is connected to.

Both are important.  (1) is important to avoid bringing all users into a
basically pointless computation and (2) is important because A's history
makes the cooccurrence matrix dense which hurts even if you don't compute
the entire matrix.

On Fri, Dec 2, 2011 at 10:05 AM, Sean Owen <srowen@gmail.com> wrote:

> Say we're recommending for user A. User A is connected to items 1, 2, 3.
> Those items are connected to other users X, Y, Z. And those users in turn
> are connected to items 100, 101, 102, 103....
>
> You can down-sample three things:
>
> 1. The 1,2,3
> 2. The X,Y,Z
> 3. The 100,101,102
>
> We already do #2. I am suggesting we add #3.
>
> On Fri, Dec 2, 2011 at 6:00 PM, Ted Dunning <ted.dunning@gmail.com> wrote:
>
> > Does #1 mean down-sample the items in each user?  Or does it only
> > down-sample the number of items for the user that we are producing
> > recommendations for?
> >
> > I recommend down-sampling for all users.  IF you down-sample biased
> toward
> > low frequency items, then this will also kill the problem of high
> frequency
> > items and you get all the performance gains you are talking about and
> more,
> > without significant error.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message