From user-return-18789-apmail-mahout-user-archive=mahout.apache.org@mahout.apache.org Sat Nov 23 12:07:39 2013 Return-Path: X-Original-To: apmail-mahout-user-archive@www.apache.org Delivered-To: apmail-mahout-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A111410A4E for ; Sat, 23 Nov 2013 12:07:39 +0000 (UTC) Received: (qmail 52328 invoked by uid 500); 23 Nov 2013 12:07:34 -0000 Delivered-To: apmail-mahout-user-archive@mahout.apache.org Received: (qmail 52269 invoked by uid 500); 23 Nov 2013 12:07:27 -0000 Mailing-List: contact user-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@mahout.apache.org Delivered-To: mailing list user@mahout.apache.org Received: (qmail 52260 invoked by uid 99); 23 Nov 2013 12:07:24 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 23 Nov 2013 12:07:24 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of saius1er@gmail.com designates 74.125.82.52 as permitted sender) Received: from [74.125.82.52] (HELO mail-wg0-f52.google.com) (74.125.82.52) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 23 Nov 2013 12:07:18 +0000 Received: by mail-wg0-f52.google.com with SMTP id x13so2177079wgg.7 for ; Sat, 23 Nov 2013 04:06:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:mime-version:to:from:subject:date:content-type; bh=xxkpHxGSlu2V0JPJo/5MN5/LVeSxJaqi9HE6bAbHW4g=; b=gOWqeYirxPzYwdY3WDFln8O+fOR0v7U8FcDvoKHMCFN9sCWvSgnAMAqYtC2tW3DJcY ZyJugws6w4HqV/ALQKpHDF+Zb/3YmVC2j+Se6B9JsJaYJFa0XnDPL0HNWgAQIxbdN/I1 x021rs5uZL2O+GeADeBS31W4GtTiVRiuMFt7GO9Yi2QsKPmxHdaXe9dw1qWRb9uLoBe6 CUzTJXmVAHgR5lfGmfKhM9uL11x0NI+uAnD6u9Jd5NCNsRbUml1wLUew5O4dMLPpl5RI 9LKScSkEXDq9fpx9VULslRytqxYsqlEuNHbSd5GUa35ugmcoaTbGkMfC6gFXzgQwywvu ix4g== X-Received: by 10.194.110.138 with SMTP id ia10mr14775500wjb.3.1385208417850; Sat, 23 Nov 2013 04:06:57 -0800 (PST) Received: from [10.235.88.96] ([80.215.1.130]) by mx.google.com with ESMTPSA id ll10sm25501711wic.9.2013.11.23.04.06.56 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Sat, 23 Nov 2013 04:06:57 -0800 (PST) Message-ID: <52909a61.2aceb40a.44f4.ffff8c18@mx.google.com> MIME-Version: 1.0 To: "user@mahout.apache.org" From: Antony ADOPO Subject: RE: HELP for implicit data feed back - beginner Date: Sat, 23 Nov 2013 13:06:46 +0100 Content-Type: multipart/alternative; boundary="_41532F18-D784-474D-B498-6E2D5712704C_" X-Virus-Checked: Checked by ClamAV on apache.org --_41532F18-D784-474D-B498-6E2D5712704C_ Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Ok, thanks You vert Much. I'LL setup my first mahout environment, try what = you advise me and come back toward you after first results. Thanks you very much -----Message d'origine----- De : "Sebastian Schelter" Envoy=C3=A9 : =E2=80=8E23/=E2=80=8E11/=E2=80=8E2013 10:18 =C3=80=C2=A0: "user@mahout.apache.org" Objet : Re: HELP for implicit data feed back - beginner Hi Antony, In my experience, using such content-based features tends to make the recommendations worse. But of course, this can be different in your case. I suggest you start with a basic item-based recommender that ignores user descriptions. In your production system, you should create the functionality to run A/B tests, so that you can test different recommenders and evaluate them according to some business metric. If you have this machinery set up, you can easily test more complicated recommenders (such as one that leverages user descriptions) and see if they peform better than standard ones. --sebastian On 23.11.2013 01:09, Antony Adopo wrote: > Ok, thanks. > But does exist a combining scenario including user description > (job,category) and (customerid,itemid) to better accurate recommendation)= . > For example, in case where I used User_Based recommender system? >=20 >=20 > 2013/11/23 Sebastian Schelter >=20 >> Antony, >> >> You don't need numeric ratings or preferences for your recommender. I >> would suggest you start by using >> >> o.a.m.cf.taste.impl.recommender.GenericBooleanPrefItemBasedRecommender >> >> which has explicitly been built to support scenarios without ratings. I >> would further suggest to use >> >> o.a.m.cf.taste.impl.similarity.LogLikelihoodSimilarity >> >> as similarity measure. >> >> Best, >> Sebastian >> >> >> On 22.11.2013 22:37, Antony Adopo wrote: >>> ok, thank you so much. I will start like this and after do some tricks = to >>> increase accuracy >>> >>> >>> 2013/11/22 Manuel Blechschmidt >>> >>>> Hallo Antony, >>>> you can use the following project as a starting point: >>>> https://github.com/ManuelB/facebook-recommender-demo >>>> >>>> Further you can purchase support for mahout at many companies e.g. Map= R, >>>> Apaxo or Cloudera. >>>> >>>> For implicit feedback just use a 1 as preference and the >>>> LogLikelihoodSimilarity. >>>> >>>> Hope that helps >>>> Manuel >>>> >>>> On 22.11.2013, at 16:22, Antony Adopo wrote: >>>> >>>>> thanks. >>>>> I've already seen this but my question is Mahout propose some >>>> collaborative >>>>> filtering function not based on preference? or how modelize these wit= h >>>>> purchases? >>>>> >>>>> Thanks >>>>> >>>>> >>>>> 2013/11/22 Smith, Dan >>>>> >>>>>> Hi Anthony, >>>>>> >>>>>> I would suggest looking into the collaborative filtering functions. >> It >>>>>> will work best if you have your customers segmented into similar >> groups >>>>>> such as those that buy high end goods vs low end. >>>>>> >>>>>> _Dan >>>>>> >>>>>> On 11/22/13 11:04 AM, "Antony Adopo" wrote: >>>>>> >>>>>>> Ok. thanks for answering very quickly >>>>>>> >>>>>>> I forgot that to mention in the customer table there is a "job" >>>> variable >>>>>>> and implicitly, I thought taht this variable will be also need for >>>>>>> accurate >>>>>>> recommendations. anyway >>>>>>> >>>>>>> I have around 200 000 customers >>>>>>> My order table is around 12 000 000 orders >>>>>>> and I have around 2 000 000 distincts (customerid,itemid) tuples >>>>>>> About (customerID,itemID) tuples, when I read Mahout or recommender >>>>>>> system >>>>>>> litterature, they use >>>>>>> (customerID,itemID,*preference*) and I don't have *preference.* >>>>>>> So exist an Mahout method or class that handle only >> (customerID,itemID) >>>>>>> data? >>>>>>> And it is possible to use external data as job or (RFM ) analysis t= o >>>> get >>>>>>> something more accurate? >>>>>>> >>>>>>> Sorry (it's about 2 weeks, I have headache how organize all of this >> to >>>>>>> build a great system). Propose your solutions and after, we'll see >>>>>>> >>>>>>> >>>>>>> >>>>>>> about >>>>>>> >>>>>>> >>>>>>> 2013/11/22 Sebastian Schelter >>>>>>> >>>>>>>> Hi Antony, >>>>>>>> >>>>>>>> I would start with a simple approach: extract all customerID,itemI= D >>>>>>>> tuples from the orders table and use them as your input data. How >> many >>>>>>>> of those do you have? The datasize will dictate whether you need t= o >>>>>>>> employ a distributed approach to recommendation mining or not. >>>>>>>> >>>>>>>> --sebastian >>>>>>>> >>>>>>>> On 22.11.2013 19:21, Antony Adopo wrote: >>>>>>>>> Morning, >>>>>>>>> >>>>>>>>> My name is Antony and I have a great recommender system to build >>>>>>>>> >>>>>>>>> I'm totally new on recommender systems. After reading all >> scientific >>>>>>>> files, >>>>>>>>> I didn't find relevant information to build mine. >>>>>>>>> >>>>>>>>> ok, my problem: >>>>>>>>> >>>>>>>>> I have to build a recommender systems for a retail industry which >>>> sold >>>>>>>>> Building products >>>>>>>>> >>>>>>>>> I don't have Explicit data (ratings) >>>>>>>>> >>>>>>>>> I have only data about purchases and all transactions and order a= nd >>>>>>>> dates. >>>>>>>>> as >>>>>>>>> >>>>>>>>> Orders table >>>>>>>>> >>>>>>>>> CustomerID >>>>>>>>> Sales_ID >>>>>>>>> Item_ID >>>>>>>>> Dates >>>>>>>>> Amount >>>>>>>>> quantity >>>>>>>>> channel_type (phone, mail,etc.) >>>>>>>>> >>>>>>>>> >>>>>>>>> I have also specific informations about users >>>>>>>>> >>>>>>>>> Users table >>>>>>>>> CustomerID >>>>>>>>> Group (engaged, frequent,buyer, newyer, etc.) >>>>>>>>> >>>>>>>>> ... and product >>>>>>>>> >>>>>>>>> Item_ID >>>>>>>>> Item_name >>>>>>>>> Iteem_parent (hierarchy) >>>>>>>>> >>>>>>>>> I don't know how to use all these informations with mahout (or >> others >>>>>>>> tools >>>>>>>>> or method) to do a good recommendation system (all presents are >> based >>>>>>>> on >>>>>>>>> ratings and all mahout systems I have seen are also based on >> ratings >>>>>>>> or >>>>>>>>> preference) >>>>>>>>> >>>>>>>>> At beginning, I thought that I have to use classical datamining >>>>>>>> methods >>>>>>>> as >>>>>>>>> Clustering or association rules but accurately recommanding n >>>> products >>>>>>>>> between 2000 products clustering in about 300 hierachical >>>>>>>> parents(not >>>>>>>>> linked to domain) become difficult with classical data mining >>>>>>>>> It is the reason that I turn myself to recommender system >>>>>>>>> >>>>>>>>> >>>>>>>>> please Help >>>>>>>>> thanks >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>> >>>>>> >>>> >>>> -- >>>> Manuel Blechschmidt >>>> M.Sc. IT Systems Engineering >>>> Dortustr. 57 >>>> 14467 Potsdam >>>> Mobil: 0173/6322621 >>>> Twitter: http://twitter.com/Manuel_B >>>> >>>> >>> >> >> >=20 --_41532F18-D784-474D-B498-6E2D5712704C_--