From user-return-15037-apmail-mahout-user-archive=mahout.apache.org@mahout.apache.org Thu Sep 20 13:20:33 2012 Return-Path: X-Original-To: apmail-mahout-user-archive@www.apache.org Delivered-To: apmail-mahout-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 39A20D20D for ; Thu, 20 Sep 2012 13:20:33 +0000 (UTC) Received: (qmail 53714 invoked by uid 500); 20 Sep 2012 13:20:31 -0000 Delivered-To: apmail-mahout-user-archive@mahout.apache.org Received: (qmail 53659 invoked by uid 500); 20 Sep 2012 13:20:31 -0000 Mailing-List: contact user-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@mahout.apache.org Delivered-To: mailing list user@mahout.apache.org Received: (qmail 53464 invoked by uid 99); 20 Sep 2012 13:20:26 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 20 Sep 2012 13:20:26 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FSL_RCVD_USER,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of srowen@gmail.com designates 209.85.223.170 as permitted sender) Received: from [209.85.223.170] (HELO mail-ie0-f170.google.com) (209.85.223.170) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 20 Sep 2012 13:20:16 +0000 Received: by iebc12 with SMTP id c12so3644668ieb.1 for ; Thu, 20 Sep 2012 06:19:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=0DChnqsJXXKHqrMBIJBrqmp/L6gZPh5IeEWe+g7zL8Y=; b=Ro06Htnm2Af5p0xYfIHzkRyMUe6/Um18rqAgLMKqU2vc6AO2OBXsnz06Egaxor+saM hfdKuHQBrCREnLnL5+qNK9Mi0cN7R7oIuId3X5TSj3KIUhK3tF2VdctFCpQTVmbcl6Ye i/RSsrd5yAgq4KFlIsoM4P/+pytcLyMQtcbkiN+oHaSTkhxtgUF/CGrzIwVNJ+ko7U2M jNCw2+tKCi1L3M39QSdNSe2lSTsv8oSskuYmxkosOJYyXS6IXi4B/YAIkHMqWoecCWdL Pt19zUZBHYQ4b6LfC3RRYiGLyhc7iSL8kcH/W9k3Dca/NjtAEesyM1m9bAxPg/IvymkS +mTg== MIME-Version: 1.0 Received: by 10.50.10.131 with SMTP id i3mr1675737igb.10.1348147195054; Thu, 20 Sep 2012 06:19:55 -0700 (PDT) Received: by 10.50.98.102 with HTTP; Thu, 20 Sep 2012 06:19:55 -0700 (PDT) In-Reply-To: References: Date: Thu, 20 Sep 2012 14:19:55 +0100 Message-ID: Subject: Re: difference between precomputed and on-the-fly processed data From: Sean Owen To: user@mahout.apache.org Content-Type: text/plain; charset=UTF-8 The problem is that you have boolean data with no ratings, so all the ratings are 1. But you are using GenericItemBasedRecommender, which expects ratings. Since it ranks on estimated ratings, but, all ratings are 1, the result is essentially random. Use GenericBooleanPrefItemBasedRecommender. On Thu, Sep 20, 2012 at 2:04 PM, Davide Pozza wrote: > Hello > > I'm trying to understand how to develop a item-based recommendation module > for an ecommerce website. > > Here's my input data.csv file format: > > USER_ID,ITEM_ID > > (data coming from the orders history, so I haven't any rating to use) > > If I correctly understand the documentation, the following implementations > should be equivalent (the first one just uses the precomputed data), but > they return different results. > Could anyone help me to understand the reason? > > FIRST IMPLEMENTATION > ==================== > DataModel dataModel = new FileDataModel(new File("data.csv"));//FORMAT > user_id,item_id > > //precomputed data generated by ItemSimilarityJob with > SIMILARITY_LOGLIKELIHOOD > ItemSimilarity similarity = new FileItemSimilarity(new > File("precomputed_data")); > > GenericItemBasedRecommender recommender = > new GenericItemBasedRecommender(dataModel, similarity); > > long userId = 8500003; > List recommendations = > recommender.recommend(userId , 5); > for (RecommendedItem recommendation : recommendations){ > System.out.println(recommendation); > } > > ==RESULT== > RecommendedItem[item:1653, value:1.0] > RecommendedItem[item:14, value:1.0] > RecommendedItem[item:1592, value:1.0] > RecommendedItem[item:25, value:1.0] > RecommendedItem[item:43, value:1.0] > > SECOND IMPLEMENTATION > ====================== > DataModel dataModel = new FileDataModel(new File("data.csv"));//FORMAT > user_id,item_id > > ItemSimilarity similarity = new LogLikelihoodSimilarity(dataModel); > > GenericItemBasedRecommender recommender = > new GenericItemBasedRecommender(dataModel, similarity); > > long userId = 8500003; > List recommendations = > recommender.recommend(userId , 5); > for (RecommendedItem recommendation : recommendations){ > System.out.println(recommendation); > } > > ==RESULT== > RecommendedItem[item:28, value:1.0] > RecommendedItem[item:14, value:1.0] > RecommendedItem[item:20, value:1.0] > RecommendedItem[item:21, value:1.0] > RecommendedItem[item:25, value:1.0] > > -- > Davide Pozza