mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Floris Devriendt <florisdevrie...@gmail.com>
Subject Re: Discrete Rating Scale
Date Mon, 14 Jul 2014 16:46:01 GMT
Hey Mario,

Thanks for the fast reply. At the moment I'm not using the Hadoop version,
but everything from org.apache.mahout.cf.taste.impl.
I'm assuming your reasoning stays the same as with the Hadoop version (as
the similarities remain the same).

Similarities I'm going to use are Pearson, Tanimoto, LogLikelihood and an
extended version of the Tanimoto Coefficient (that takes into account like
/ dislike values).

If I'm not mistaken, the Tanimoto and LogLikelihood disregard the value of
preferences by default and so "like" and "dislike" are both treated as
"True" which, as you say, means "interacted with".

Thanks again for the answer, they were helpful!

Best regards,
Floris Devriendt




2014-07-14 17:52 GMT+02:00 <mario.alemi@gmail.com>:

> If you are using the
> distributed org.apache.mahout.cf.taste.hadoop.item.RecommenderJob you
> should never use "0" . If you do that, when you multiply the co-occurence
> matrix times the user's rating vector you remove elements in the matrix,
> which is like if the user never interacted with the item.
>
> For the same reason, "-1" should work, because actually subtract score from
> any book which similar to the one with negative rating.
>
> For CosineSimilarity, 0 has to be avoided for obvious reasons (no cosine
> defined at the origin of the axis), and 1 and 2 are possibly the values I'd
> go for.
>
> Tanimoto and LogLikelihood are True/False, but False means "not
> interacted". Having "dislike = False" would be extremely misleading.
>
> For all the other algorithms, I'd say one should make similar
> considerations.
>
> Cheers
> Mario
>
>
> On Mon, Jul 14, 2014 at 4:21 PM, Floris Devriendt <
> florisdevriendt@gmail.com
> > wrote:
>
> > Hey all,
> >
> > When using a discrete rating scale (e.g. likes / dislikes), what are the
> > things that I should consider when using Mahout for Collaborative
> > Filtering?
> >
> > If I'm not mistaking I've read a mail a week or two ago from this mailing
> > list stating that one should avoid using 0 (dislike) and 1 (like) as
> > scores, because Mahout would not be able to take into account the
> dislikes
> > properly.
> > If this is true, what scores should I give to my like/dislike scale?
> (e.g.
> > is -1/1 better than 0/1, or should I use 1/2 with 1 = dislike and 2 =
> > like?)
> >
> > Best regards,
> > Floris Devriendt
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message