mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <sro...@gmail.com>
Subject Re: Newbie question on modeling a Recommender using Mahout when the matrix is sparse
Date Thu, 13 Sep 2012 06:36:57 GMT
Well there are only 7 products in the universe! If you ask for 10
recommendations, you will always get all unrated items back in the
recommendations. That's always true unless the algorithm can't
actually establish a value for some items.

What result were you expecting, less than 10 recs? less than 7?

On Thu, Sep 13, 2012 at 6:55 AM, Gokul Pillai <gokooltech@gmail.com> wrote:
> I am trying out Mahout to come up with product recommendations for users
> based on data that show what products they use today.
> The data is not web-scale, just about 300,000 users and 7 products. Few
> comments about the data here:
> 1. Since users either have or not have a particular product, the value in
> the matrix is either "1" or "0" for all the columns (rows being the userids)
> 2. All the users have one basic product, so I discounted this from the
> data-model passed to the Mahout recommender since I assume that if everyone
> has the same product, its effect on the recommendations are trivial.
> 3. The matrix itself is sparse, the total counts of users having each
> product is :
> A=31847, 54754,1897 |    23154 |    2201 |    2766 |    33585
>
> Steps followed:
> 1. Created a data-source from the user-product table in the database
>         File ratingsFile = new
> File("datasets/products.csv");
>         DataModel model = new FileDataModel(ratingsFile);
>   2.  Created a recommender on this data
>         CachingRecommender recommender = new CachingRecommender(new
> SlopeOneRecommender(model));
> 3. Loop through all users and get the top ten recommendations:
>         List<RecommendedItem> recommendations =
> recommender.recommend(userId, 10);
>
> Issue faced:
> The problem I am facing is that the recommendations that come out are way
> too simple - meaning that all that it seems like what is being recommended
> is "if a user does not have product A, then recommend it, if they dont have
> product B, then recommend it and so on." Basically a simple inverse of
> their ownership status.
>
> Obviously, I am not doing something right here. How can I do the modeling
> better to get the right recommendations. Or is it that my dataset (300000
> users times 7 products) is too small for Mahout to work with?
>
> Look forward to your comments. Thanks.

Mime
View raw message