mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Davide Pozza <davide.po...@gmail.com>
Subject Re: difference between precomputed and on-the-fly processed data
Date Thu, 20 Sep 2012 14:35:03 GMT
Hello Sebastian

thanks for the reply.

After adding the GenericBooleanPrefItemBasedRecommender instead of
the GenericItemBasedRecommender I obtain the following results:

FIRST
RecommendedItem[item:4140, value:2.7275915]
RecommendedItem[item:3982, value:2.7191503]
RecommendedItem[item:1377, value:2.7180452]
RecommendedItem[item:2706, value:2.7041116]
RecommendedItem[item:4010, value:2.702695]
-----------------

SECOND
RecommendedItem[item:4140, value:4.4948235]
RecommendedItem[item:2108, value:4.3325663]
RecommendedItem[item:1968, value:4.330123]
RecommendedItem[item:2835, value:4.3260937]
RecommendedItem[item:2902, value:4.3107653]

Could the difference be due to the pruning you're talking about?
If so which of the two implementation do you think could be considered
better?

Thanks again

Davide

2012/9/20 Sebastian Schelter <ssc@apache.org>

> You should also be aware that ItemSimilarityJob applies some pruning by
> default, that can also be a reason for different results.
>
> Best,
> Sebastian
>
> On 20.09.2012 15:19, Sean Owen wrote:
> > The problem is that you have boolean data with no ratings, so all the
> > ratings are 1. But you are using GenericItemBasedRecommender, which
> > expects ratings. Since it ranks on estimated ratings, but, all ratings
> > are 1, the result is essentially random.
> >
> > Use GenericBooleanPrefItemBasedRecommender.
> >
> > On Thu, Sep 20, 2012 at 2:04 PM, Davide Pozza <davide.pozza@gmail.com>
> wrote:
> >> Hello
> >>
> >> I'm trying to understand how to develop a item-based recommendation
> module
> >> for an ecommerce website.
> >>
> >> Here's my input data.csv file format:
> >>
> >> USER_ID,ITEM_ID
> >>
> >> (data coming from the orders history, so I haven't any rating to use)
> >>
> >> If I correctly understand the documentation, the following
> implementations
> >> should be equivalent (the first one just uses the precomputed data), but
> >> they return different results.
> >> Could anyone help me to understand the reason?
> >>
> >> FIRST IMPLEMENTATION
> >> ====================
> >> DataModel dataModel = new FileDataModel(new File("data.csv"));//FORMAT
> >> user_id,item_id
> >>
> >> //precomputed data generated by ItemSimilarityJob with
> >> SIMILARITY_LOGLIKELIHOOD
> >> ItemSimilarity similarity = new FileItemSimilarity(new
> >> File("precomputed_data"));
> >>
> >> GenericItemBasedRecommender recommender =
> >>     new GenericItemBasedRecommender(dataModel, similarity);
> >>
> >> long userId = 8500003;
> >> List<RecommendedItem> recommendations =
> >>     recommender.recommend(userId , 5);
> >> for (RecommendedItem recommendation : recommendations){
> >>     System.out.println(recommendation);
> >> }
> >>
> >> ==RESULT==
> >> RecommendedItem[item:1653, value:1.0]
> >> RecommendedItem[item:14, value:1.0]
> >> RecommendedItem[item:1592, value:1.0]
> >> RecommendedItem[item:25, value:1.0]
> >> RecommendedItem[item:43, value:1.0]
> >>
> >> SECOND IMPLEMENTATION
> >> ======================
> >> DataModel dataModel = new FileDataModel(new File("data.csv"));//FORMAT
> >> user_id,item_id
> >>
> >> ItemSimilarity similarity = new LogLikelihoodSimilarity(dataModel);
> >>
> >> GenericItemBasedRecommender recommender =
> >>     new GenericItemBasedRecommender(dataModel, similarity);
> >>
> >> long userId = 8500003;
> >> List<RecommendedItem> recommendations =
> >>        recommender.recommend(userId , 5);
> >> for (RecommendedItem recommendation : recommendations){
> >> System.out.println(recommendation);
> >> }
> >>
> >> ==RESULT==
> >> RecommendedItem[item:28, value:1.0]
> >> RecommendedItem[item:14, value:1.0]
> >> RecommendedItem[item:20, value:1.0]
> >> RecommendedItem[item:21, value:1.0]
> >> RecommendedItem[item:25, value:1.0]
> >>
> >> --
> >> Davide Pozza
>
>


-- 
Davide Pozza

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message