mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <>
Subject Re: Recommendations from flat data
Date Fri, 01 May 2009 18:10:03 GMT
As a small follow-up on this, here's a small result that should hold --

Setting the sampling rate to, say, 1/X (i.e. if you set it to 20%,
X=5), should reduce the time spent in finding a neighborhood by a
factor of X. Of course. Assuming users are pretty evenly scattered
around your rating-space, the average distance to users in your
computed neighborhood also increases by a factor of X.

So you get results X times faster, but the results you get are X times
'worse'. This sounds bad but consider that users 5 times farther away
in your rating-space may still be suitable neighbors and yield the
same recommendations.

On Fri, May 1, 2009 at 8:32 AM, Sean Owen <> wrote:
> It really depends on the nature of the data and what tradeoff you want
> to make. I have not studied this in detail. Anecdotally, on a
> large-ish data set you can ignore most users and still end up with an
> OK neighborhood.
> Actually I should do a bit of math to get an analytical result on
> this, let me do that.

View raw message