mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Recommending when working with binary data sets
Date Fri, 26 Sep 2008 17:52:56 GMT
Hi,

I've been reading the chapter on recommendations in Programming Collective Intelligence and
looking at Taste.  The examples in PCI all assume scenarios where items to recommend have
been rated by users on some scale.  I understand how items can be recommended to users using
item-based filtering and user-item ratings and why this is preferred over user-based filtering
when the number of users is larger than the number of items.
But what if all I've got is item-item similarity (content-based) and there are no user-item
ratings?  Say I have a situation where people simply either consume content (e.g. read an
article, watch a video...) or not consume it (don't read an article, don't watch the video...).
 In other words, I really have only yes/no or 1/0 or seen/not seen type "rating".

I can't really use Euclidean distance or Pearson correlation coefficient, can I?

What do people use in such scenarios?  Would it make sense to use http://en.wikipedia.org/wiki/Jaccard_index
for such cases?
... Ah, I do see javadoc in TanimotoCoefficientSimilarity saying exactly that, good.

But then my question is:
Doesn't the use of Jaccard/Tanimoto mean going back to the expensive user-user similarity
computation?

That is, if I need to recommend items for user U1 don't I need to:
1) have user-user similarity pre-computed (and recomputed periodically)
2) find top N users U{2,3,4,...N} who are the most similar to U1
3) then for these top N most similar users find their "seen" items that U1 has not seen (possibly
limit this to only recently seen items)
4) select top N items from 3) and recommend those to U1.

If so, then 1) is again expensive.
And what how would one go about selecting top N items from the list in this case other than
ordering them by user-user similarity?

Of course, something is telling me I'm demonstrating that I don't yet have the full grasp
of item-based filtering.  I hope that's the case! :)

Thanks,
Otis

Mime
View raw message