mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From PierLorenzo Bianchini <>
Subject Re: evaluating recommender
Date Sat, 11 Apr 2015 20:44:47 GMT
Oh right, why did I not think about that :) you're totally right
Thanks a bunch! concerning MAP and the other methods you mentioned. Nice, I had a quick look.
I'll definitely dig deeper after I'm done with my submission (tomorrow night...).


On Fri, 4/10/15, Pat Ferrel <> wrote:

 Subject: Re: evaluating recommender
 To: "" <>
 Date: Friday, April 10, 2015, 11:42 PM
 I think that depends on
 the rating range you are using. It measures the error
 between predicted and actual rating. Google RMSE for a
 better explanation.
 that is an old and not very good metric. It was popularized
 by the Netlfix prize many years ago when they thought they
 wanted to predict ratings. Actually even Netflix admits that
 _ranking_ recs is far more important. If you can only show a
 few recs they had better be ranked the best you can. For
 this a precision metric is better. I use mean average
 precision (MAP).
 Be aware
 also that using an offline metric to judge different
 algorithms is not very reliable. Online A/B or Bayesian
 Bandit tests are much better.
 On Apr 10, 2015, at 5:49 AM, PierLorenzo
 Bianchini <>
 Hi all,
 I have a question on the results of an
 evaluation (I'm using
 I'm getting a result of
 "0.7432629235004433" with one of the recommenders
 I'm testing. I read in several places that
 "0.0" would be the perfect result, but I
 couldn't find which ranges are acceptable.
 I've seen values ranging from 0.49 to 1.04
 with different implementations (I mostly do user-based with
 Pearson and model based with SVD transformations) and
 setting different parameters. I've also seen values up
 to 3.0 but I was testing "bad" cases (low amount
 of data used, bad percentage of trainign data, etc.; I guess
 I could get results even worse than that but I didn't
 try it)
 When can I consider that my
 recommender is "good enough", when should I
 consider that my evaluation is too bad? (for now I randomly
 assumed that 0.9 is a good value and I'm trying to stick
 around that value)
 Perhaps someone knows
 where I could find a documentation for this? any help would
 be appreciated.
 Thank you! Regards,
 *FYI* I have a user/movie/rating dataset. 6000
 users for 3900 movies. I have a static training file with
 800.000 triplets and I'm using them to evaluate
 different types of recommender (this is a university
 requirement, I'm not talking about production

View raw message