mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <sro...@gmail.com>
Subject Re: GenericUserBasedRecommender vs GenericItemBasedRecommender
Date Thu, 23 Sep 2010 06:55:51 GMT
On Thu, Sep 23, 2010 at 2:35 AM, gabeweb <gabriel_webster@htc.com> wrote:
> I think the simple point is that the primary use case of a recommender is to
> return the n-best recommended items, rather than return the predicted rating
> for a single item.  In that case, if an item can't get a predicted rating
> because no users in the neighborhood have rated it, then that lack of
> ratings clearly suggests that similar users are not interested in that item!

That's right. I'm guessing the idea was to make a hybrid approach.
Still build a neighborhood and pick candidates from there, but compute
an estimate over everyone (neighborhood or not) that rated the item.

That is coherent, but, you're still picking a neighborhood but now
basing similarity computation on the tastes of potentially quite
dissimilar users. It might not be a good thing. In any event I don't
think this hybrid is then "symmetric" with item-based recommenders,
yes.


> For item-based recommenders, I think the problem of using a fixed "nearest
> item" neighborhood is the fact that any particular user will not have
> ratings for many of the items in that fixed neighborhood -- which renders
> those items being in the neighborhood useless for predicting ratings for

You mean, start with the user-rated items as candidate items, then use
a neighborhood of items around those as the basis for a similarity
computation? Yes exactly, that doesn't work. The user probably hasn't
rated much in that neighborhood.

> that user.  So in this case, it makes more sense to use the user-rated items
> as the neighborhood.  However, in this case, I could see the argument for
> putting an upper limit on this neighborhood size, in case the user has rated

(Oops maybe that's not what you were getting at.)

> a huge number of items.  One could calculate the (e.g.) 500 most similar
> items that a particular used has rated, and use that as the neighborhood,
> instead of all of the (e.g.) 2,000 items that the user rated.  That would
> obviously be a speed optimization analogous to setting the user-user
> neighborhood size, rather than something that would be necessarily expected
> to improve accuracy.

Mime
View raw message