mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: taking mahout into production
Date Fri, 20 May 2011 18:01:23 GMT
Sean will be able to address scaling and configuration better than I, but I
have built video recommendation systems before and found that

a) ratings are nearly worthless, largely because so few people will rate
things

b) the best preference data we ever found was whether the user viewed the
asset longer than 30 seconds.  This is a binary preference and it helps to
have it that way since you can make use of a number of economies.

c) some randomization in recommendations is very important so that you
preserve some exploratory behavior.  I implemented this by adding small
amounts of noise to recommendation scores to perturb the ranking.

On Fri, May 20, 2011 at 10:31 AM, Varnit Khanna <varnitk@gmail.com> wrote:

> Hi,
> I have been considering using mahout for our recommendation engine
> needs and had couple of questions about using it in production.
>
> Use Case:
> We need to provide recommendation on video assets (similar to hulu) to
> couple of million users and we have over 100K assets. Since we are
> experiencing growth both in users and assets I am planning to use
> mahout on hadoop.
>
> Preference Data:
> Currently we do not have a ratings system built into our video
> player/page but we do have logs on user impressions on video assets
> which I will be feeding into RecommenderJob. Until we build a ratings
> system I am planning on using the following preference data:
>
> Impressions | Rating
>                1 |  (empty)
>                2 | 2
>                3 | 3
>                4 | 4
>            >=5 | 5
>
> Does this preference data make sense? I will be using the standard
> RecommenderJob to generate recommendations until I get a better
> understanding of mahout.
>
> Questions:
> 1) What will be the best approach to deal with cold start on new
> assets and users?
> 2) Is it typical to parse the entire dataset in production to generate
> recommendations for new assets and users or can it be done
> incrementally?
> 3) What is a better approach for this use case item or user based CF?
> Also at some point in the future we would like to generate
> recommendations on news assets so a single system might be beneficial.
>
> Thanks
> -varnit
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message