mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Owen" <sro...@gmail.com>
Subject Re: Recommending when working with binary data sets
Date Tue, 30 Sep 2008 12:34:47 GMT
Sorry for the late reply -- I've been traveling.

On Fri, Sep 26, 2008 at 6:52 PM, Otis Gospodnetic
<otis_gospodnetic@yahoo.com> wrote:
> I've been reading the chapter on recommendations in Programming Collective Intelligence
and looking at Taste.  The examples in PCI

(PS that is a really good book. Recommended -- highly recommended --
to everyone involved with Mahout. I kinda cross-checked what I had
done against the book and think it agrees. The book suggested more
good ideas, particularly the Tanimoto coefficient business.)

> I can't really use Euclidean distance or Pearson correlation coefficient, can I?

You could but it wouldn't make much sense. In the framework I do have
an implementation of Preference which is supposed to encapsulate a
binary value like this. Its existence means a 'yes' and as far as the
framework is concerned means the user expresses a '1.0' preference for
the item. That value doesn't really matter.

(and yes, it would be more efficient to not have such a simple dummy
implementation of Preference to represent this. I threw it in since it
fits cleanly in the framework. Get it right first -- then make it
fast. If there is interest in these areas then we start making more
customized versions of User and some of the algorithms that take
advantage of the fact that preferences are binary.)


> What do people use in such scenarios?  Would it make sense to use http://en.wikipedia.org/wiki/Jaccard_index
for such cases?
> ... Ah, I do see javadoc in TanimotoCoefficientSimilarity saying exactly that, good.
>
> But then my question is:
> Doesn't the use of Jaccard/Tanimoto mean going back to the expensive user-user similarity
computation?

TanimotoCoefficientSimilarity implements both UserSimilarity and
ItemSimilarity, so it can be plugged into either a user-based or
item-based recommender, which need a UserSimilarity or ItemSimilarity,
respectively. So, no, you aren't forced to user-based recommenders in
this context.

Mime
View raw message