mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Recommendations on binary ratings
Date Sat, 26 Jun 2010 20:16:19 GMT
Pranay,

Sean's comments are dead-on.  You may be able to get a feel for how good (or
not) that these results are by marking all unrated items either as good or
bad.  That will likely tell you that the real precision is between 0.22 and
0.9.  This same problem is exhibited by essentially all other off-line
evaluation techniques since you can only evaluate against things that you
already know, not against the chance that a recommender is bring new and
valuable information to the game.

The only good way to evaluate a recommender is in a live situation and the
metric you should use is whatever business metric that makes the most sense,
but which you can measure quickly.  On this basis, one year retention of
paying customers is a nice metric except that it takes a year to measure.
 At the other end of the spectrum is click-through rate which has problems
with not being a real business metric but is quick to measure.  There is
also some evidence that click-through is a reasonable surrogate of
engagement (see here, for example:
http://www.deepdyve.com/lp/association-for-computing-machinery/predicting-bounce-rates-in-sponsored-search-advertisements-9gQZJ4nxoW).
 I have had some problems with click-through, though in situations
where
short labels were very deceptive and thus had high click and high bounce
rates.  Any better engagement indicator that you can get is good.

The real reason that on-line evaluation is critical is that your recommender
is part of the loop.  As it surfaces items that users are interested in, it
learns and then hopefully surfaces better items.  A different recommender
might well go off in a different direction.  Neither one would necessarily
provide good test data for the other.

Typically the way that you deal with this is that you have to segment your
user population and provide alternative recommendations to different people.
 You can also get reasonable data by blending the recommended items from
several recommenders into composite recommendations, but that leads to very
difficult problems of placement bias.

Good luck with this.  We would love to hear more as you continue your
efforts.

On Sat, Jun 26, 2010 at 11:22 AM, pranay venkata <svpranay@gmail.com> wrote:

> Hi,
> Thanks to all for immediate responses.
>
> I have tested my binary recommender on one million dataset by dividing it
> into 80 % train , 20 % test data-set and i observe  an average precision
> value as 0.22 (i.e out of 20 recommendations produced by the recommender
> ,there are around 5 matches with items in the test data-set)  and average
> recall value to be 0.0135 for my recommendations.
> I would like to know the quality of  these recommendations on the given
> precision and recall values  ? how to estimate the quality of a recommender
> on the values of  precision and recall and how much better could it be
> improved practically  ?
>
> Thanks,
> svpranay.
>
> On Fri, Jun 11, 2010 at 6:00 PM, pranay venkata <svpranay@gmail.com>
> wrote:
>
> >  Hi,
> >
> > I'm a newbie to mahout.My aim is to produce recommendations on binary
> user
> > purchased data.So i applied item-item similarity model in computing top N
> > recommendations for movie lens data assuming 1-3 ratings as a 0 and 4-5
> > ratings as a 1.Then i tried evaluating my recommendations with the
> ratings
> > in the test-data but hardly there have been two or three matches from my
> top
> > 20 recommendations to the top rated items in test data and no match for
> most
> > users.
> >
> > So are my recommendations totally bad by nature or do i need to go for a
> > different measure for evaluating my recommendations ?
> >
> > Please help me ! Thanks in advance.
> >
> > Pranay, 2nd yr ,UG student.
> >
> >
>
>
> --
> regards
> svpranay
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message