mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <>
Subject Re: evaluating recommender with boolean prefs
Date Fri, 07 Jun 2013 20:50:11 GMT
It depends on the algorithm I suppose. In some cases, the
already-known items would always be top recommendations and the test
would tell you nothing. Just like in an RMSE test -- if you already
know the right answers your score is always a perfect 0.

But in some cases I agree you could get some of use out of observing
where the algorithm ranks known associations, because they won't in
some cases all be the very first ones.

it raises an interesting question: if the top recommendation wasn't an
already known association, how do we know it's "wrong"? We don't. You
rate Star Trek, Star Trek V, and Star Trek IV. Say Star Trek II is
your top recommendation. That's actually probably right, and should be
ranked higher than all your observed associations. (It's a good
movie.) But the test would consider it wrong. In fact anything that
you haven't interacted with before is "wrong".

This sort of explains why precision/recall can be really low in these
tests. I would not be surprised if you get 0 in some cases, on maybe
small input. Is it a bad predictor? maybe, but it's not clear.

On Fri, Jun 7, 2013 at 8:06 PM, Koobas <> wrote:
> Since I am primarily an HPC person, probably a naive question from the ML
> perspective.
> What if, when computing recommendations, we don't exclude what the user
> already has,
> and then see if the items he has end up being recommended to him (compute
> some appropriate metric / ratio)?
> Wouldn't that be the ultimate evaluator?
> On Fri, Jun 7, 2013 at 2:58 PM, Sean Owen <> wrote:
>> In point 1, I don't think I'd say it that way. It's not true that
>> test/training is divided by user, because every user would either be
>> 100% in the training or 100% in the test data. Instead you hold out
>> part of the data for each user, or at least, for some subset of users.
>> Then you can see whether recs for those users match the held out data.
>> Yes then you see how the held-out set matches the predictions by
>> computing ratios that give you precision/recall.
>> The key question is really how you choose the test data. It's implicit
>> data; one is as good as the next. In the framework I think it just
>> randomly picks a subset of the data. You could also split by time;
>> that's a defensible way to do it. Training data up to time t and test
>> data after time t.
>> On Fri, Jun 7, 2013 at 7:51 PM, Michael Sokolov
>> <> wrote:
>> > I'm trying to evaluate a few different recommenders based on boolean
>> > preferences.  The in action book suggests using an precision/recall
>> metric,
>> > but I'm not sure I understand what that does, and in particular how it is
>> > dividing my data into test/train sets.
>> >
>> > What I think I'd like to do is:
>> >
>> > 1. Divide the test data by user: identify a set of training data with
>> data
>> > from 80% of the users, and test using the remaining 20% (say).
>> >
>> > 2. Build a similarity model from the training data
>> >
>> > 3. For the test users, divide their data in half; a "training" set and an
>> > evaluation set.  Then for each test user, use their training data as
>> input
>> > to the recommender, and see if it recommends the data in the evaluation
>> set
>> > or not.
>> >
>> > Is this what the precision/recall test is actually doing?
>> >
>> > --
>> > Michael Sokolov
>> > Senior Architect
>> > Safari Books Online
>> >

View raw message