mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <sro...@gmail.com>
Subject Re: Mahout performance issues
Date Fri, 02 Dec 2011 18:05:38 GMT
Say we're recommending for user A. User A is connected to items 1, 2, 3.
Those items are connected to other users X, Y, Z. And those users in turn
are connected to items 100, 101, 102, 103....

You can down-sample three things:

1. The 1,2,3
2. The X,Y,Z
3. The 100,101,102

We already do #2. I am suggesting we add #3.

On Fri, Dec 2, 2011 at 6:00 PM, Ted Dunning <ted.dunning@gmail.com> wrote:

> Does #1 mean down-sample the items in each user?  Or does it only
> down-sample the number of items for the user that we are producing
> recommendations for?
>
> I recommend down-sampling for all users.  IF you down-sample biased toward
> low frequency items, then this will also kill the problem of high frequency
> items and you get all the performance gains you are talking about and more,
> without significant error.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message