mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Varnit Khanna <>
Subject taking mahout into production
Date Fri, 20 May 2011 17:31:44 GMT
I have been considering using mahout for our recommendation engine
needs and had couple of questions about using it in production.

Use Case:
We need to provide recommendation on video assets (similar to hulu) to
couple of million users and we have over 100K assets. Since we are
experiencing growth both in users and assets I am planning to use
mahout on hadoop.

Preference Data:
Currently we do not have a ratings system built into our video
player/page but we do have logs on user impressions on video assets
which I will be feeding into RecommenderJob. Until we build a ratings
system I am planning on using the following preference data:

Impressions | Rating
                1 |  (empty)
                2 | 2
                3 | 3
                4 | 4
            >=5 | 5

Does this preference data make sense? I will be using the standard
RecommenderJob to generate recommendations until I get a better
understanding of mahout.

1) What will be the best approach to deal with cold start on new
assets and users?
2) Is it typical to parse the entire dataset in production to generate
recommendations for new assets and users or can it be done
3) What is a better approach for this use case item or user based CF?
Also at some point in the future we would like to generate
recommendations on news assets so a single system might be beneficial.


View raw message