mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sebastian Schelter <...@apache.org>
Subject Re: difference between precomputed and on-the-fly processed data
Date Thu, 20 Sep 2012 13:33:55 GMT
You should also be aware that ItemSimilarityJob applies some pruning by
default, that can also be a reason for different results.

Best,
Sebastian

On 20.09.2012 15:19, Sean Owen wrote:
> The problem is that you have boolean data with no ratings, so all the
> ratings are 1. But you are using GenericItemBasedRecommender, which
> expects ratings. Since it ranks on estimated ratings, but, all ratings
> are 1, the result is essentially random.
> 
> Use GenericBooleanPrefItemBasedRecommender.
> 
> On Thu, Sep 20, 2012 at 2:04 PM, Davide Pozza <davide.pozza@gmail.com> wrote:
>> Hello
>>
>> I'm trying to understand how to develop a item-based recommendation module
>> for an ecommerce website.
>>
>> Here's my input data.csv file format:
>>
>> USER_ID,ITEM_ID
>>
>> (data coming from the orders history, so I haven't any rating to use)
>>
>> If I correctly understand the documentation, the following implementations
>> should be equivalent (the first one just uses the precomputed data), but
>> they return different results.
>> Could anyone help me to understand the reason?
>>
>> FIRST IMPLEMENTATION
>> ====================
>> DataModel dataModel = new FileDataModel(new File("data.csv"));//FORMAT
>> user_id,item_id
>>
>> //precomputed data generated by ItemSimilarityJob with
>> SIMILARITY_LOGLIKELIHOOD
>> ItemSimilarity similarity = new FileItemSimilarity(new
>> File("precomputed_data"));
>>
>> GenericItemBasedRecommender recommender =
>>     new GenericItemBasedRecommender(dataModel, similarity);
>>
>> long userId = 8500003;
>> List<RecommendedItem> recommendations =
>>     recommender.recommend(userId , 5);
>> for (RecommendedItem recommendation : recommendations){
>>     System.out.println(recommendation);
>> }
>>
>> ==RESULT==
>> RecommendedItem[item:1653, value:1.0]
>> RecommendedItem[item:14, value:1.0]
>> RecommendedItem[item:1592, value:1.0]
>> RecommendedItem[item:25, value:1.0]
>> RecommendedItem[item:43, value:1.0]
>>
>> SECOND IMPLEMENTATION
>> ======================
>> DataModel dataModel = new FileDataModel(new File("data.csv"));//FORMAT
>> user_id,item_id
>>
>> ItemSimilarity similarity = new LogLikelihoodSimilarity(dataModel);
>>
>> GenericItemBasedRecommender recommender =
>>     new GenericItemBasedRecommender(dataModel, similarity);
>>
>> long userId = 8500003;
>> List<RecommendedItem> recommendations =
>>        recommender.recommend(userId , 5);
>> for (RecommendedItem recommendation : recommendations){
>> System.out.println(recommendation);
>> }
>>
>> ==RESULT==
>> RecommendedItem[item:28, value:1.0]
>> RecommendedItem[item:14, value:1.0]
>> RecommendedItem[item:20, value:1.0]
>> RecommendedItem[item:21, value:1.0]
>> RecommendedItem[item:25, value:1.0]
>>
>> --
>> Davide Pozza


Mime
View raw message