mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <sro...@gmail.com>
Subject Re: A question regarding GenericUserBasedRecommender
Date Thu, 12 Aug 2010 19:53:33 GMT
I agree with your reading of what the Herlocker paper is saying. The
paper is focused on producing one estimated rating, not
recommendations. While those tasks are related -- recommendations are
those with the highest estimated ratings -- translating what's in
Herlocker directly to a recommendation algorithm is a significant
jump.

Done that way, you start with all items as the set of candidate
recommendations, and for each, construct a neighborhood to estimate a
rating. Even with intelligent caching of user-user similarity -- the
framework does this -- this is orders of magnitude slower. It's
possible, but I don't think it's realistic in practice.

Instead I had always assumed the extension to an actual algorithm was
to let one neighborhood define the set of candidate items.

The issue isn't quite coverage, I think. If a user has no similarity
to any user, there can be no neighborhood, under any approach, and no
recommendations. If there is any neighborhood, recommendations can be
made.

The issue here it seems is including some particular item in the
recommendation, which is included in *some* neighborhood but not all
neighborhoods.

You give a good example where an item that, intuitively, should be
recommended is not a candidate for recommendation. I think there are
equal examples of this idea going wrong. Say that the most similar
users all have a similarity near -1. Under a simple threshold-based
neighborhood approach, no recommendations would be made, although,
there is indeed *some* neighborhood including those dissimilar users
from which recommendations could be made. But those aren't, likely,
good recommendations.

This is why I believe it's not in general a good idea to construct,
for each item, *some* neighborhood that finds that items and predict
from there. I can't say I've tested that claim though.

But what the Herlocker paper suggests, and I agree with, is that using
threshold-based definitions of neighborhoods is a good idea. And then
I think that the practical difference between constructing one
neighborhood and getting candidate items from there, versus
constructing a neighborhood for every item, is probably small. Again,
haven't tested that claim directly.

That's why I think the current implementation is OK, and at least
innocent until proven guilty, and why I also believe that this is the
canonical approach as well.

Mime
View raw message