mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Osman Başkaya <>
Subject Re: Problems with Mahout's RecommenderIRStatsEvaluator
Date Sun, 17 Feb 2013 11:56:24 GMT
I am sorry to extend the unsupervised/supervised discussion which is not
the main question here but I need to ask.

Sean, I don't understand your last answer. Let's assume our rating scale is
from 1 to 5. We can say that those movies which a particular user rates as
5 are relevant for him/her. 5 is just a number, we can use *relevance
threshold *like you did and we can follow the method described in Cremonesi
et al. Performance of Recommender Algorithms on Top-N Recommendation
*2. Testing Methodology - p.2*).

Are you saying that this job is unsupervised since no user can rate all of
the movies. For this reason, we won't be sure that our predicted top-N list
contains no relevant item because it can be possible that our top-N
recommendation list has relevant movie(s) which hasn't rated by the user *
yet* as relevant. By using this evaluation procedure we miss them.

In short, The following assumption can be problematic:

We randomly select 1000 additional items unrated by
> user u. We may assume that most of them will not be
> of interest to user u.

Although bigger N values overcomes this problem mostly, still it does not
seem totally supervised.

On Sun, Feb 17, 2013 at 1:49 AM, Sean Owen <> wrote:

> The very question at hand is how to label the data as "relevant" and "not
> relevant" results. The question exists because this is not given, which is
> why I would not call this a supervised problem. That may just be semantics,
> but the point I wanted to make is that the reasons choosing a random
> training set are correct for a supervised learning problem are not reasons
> to determine the labels randomly from among the given data. It is a good
> idea if you're doing, say, logistic regression. It's not the best way here.
> This also seems to reflect the difference between whatever you want to call
> this and your garden variety supervised learning problem.
> On Sat, Feb 16, 2013 at 11:15 PM, Ted Dunning <>
> wrote:
> > Sean
> >
> > I think it is still a supervised learning problem in that there is a
> > labelled training data set and an unlabeled test data set.
> >
> > Learning a ranking doesn't change the basic dichotomy between supervised
> > and unsupervised.  It just changes the desired figure of merit.
> >

Osman Başkaya
Koc University
MS Student | Computer Science and Engineering

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message