mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Antony ADOPO <saius...@gmail.com>
Subject RE: HELP for implicit data feed back - beginner
Date Sat, 23 Nov 2013 12:06:46 GMT
Ok, thanks You vert Much. I'LL setup my first mahout environment, try what you advise me and
come back toward you after first results.
Thanks you very much

-----Message d'origine-----
De : "Sebastian Schelter" <ssc.open@googlemail.com>
Envoyé : ‎23/‎11/‎2013 10:18
À : "user@mahout.apache.org" <user@mahout.apache.org>
Objet : Re: HELP for implicit data feed back - beginner

Hi Antony,

In my experience, using such content-based features tends to make the
recommendations worse. But of course, this can be different in your case.

I suggest you start with a basic item-based recommender that ignores
user descriptions. In your production system, you should create the
functionality to run A/B tests, so that you can test different
recommenders and evaluate them according to some business metric. If you
have this machinery set up, you can easily test more complicated
recommenders (such as one that leverages user descriptions) and see if
they peform better than standard ones.

--sebastian

On 23.11.2013 01:09, Antony Adopo wrote:
> Ok, thanks.
> But does exist a combining scenario  including user description
> (job,category) and (customerid,itemid) to better accurate recommendation) .
> For example, in case where I used User_Based recommender system?
> 
> 
> 2013/11/23 Sebastian Schelter <ssc.open@googlemail.com>
> 
>> Antony,
>>
>> You don't need numeric ratings or preferences for your recommender. I
>> would suggest you start by using
>>
>> o.a.m.cf.taste.impl.recommender.GenericBooleanPrefItemBasedRecommender
>>
>> which has explicitly been built to support scenarios without ratings. I
>> would further suggest to use
>>
>> o.a.m.cf.taste.impl.similarity.LogLikelihoodSimilarity
>>
>> as similarity measure.
>>
>> Best,
>> Sebastian
>>
>>
>> On 22.11.2013 22:37, Antony Adopo wrote:
>>> ok, thank you so much. I will start like this and after do some tricks to
>>> increase accuracy
>>>
>>>
>>> 2013/11/22 Manuel Blechschmidt <Manuel.Blechschmidt@gmx.de>
>>>
>>>> Hallo Antony,
>>>> you can use the following project as a starting point:
>>>> https://github.com/ManuelB/facebook-recommender-demo
>>>>
>>>> Further you can purchase support for mahout at many companies e.g. MapR,
>>>> Apaxo or Cloudera.
>>>>
>>>> For implicit feedback just use a 1 as preference and the
>>>> LogLikelihoodSimilarity.
>>>>
>>>> Hope that helps
>>>>     Manuel
>>>>
>>>> On 22.11.2013, at 16:22, Antony Adopo wrote:
>>>>
>>>>> thanks.
>>>>> I've already seen this but my question is Mahout propose some
>>>> collaborative
>>>>> filtering function not based on preference? or how modelize these with
>>>>> purchases?
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>> 2013/11/22 Smith, Dan <Dan.Smith@disney.com>
>>>>>
>>>>>> Hi Anthony,
>>>>>>
>>>>>> I would suggest looking into the collaborative filtering functions.
>>  It
>>>>>> will work best if you have your customers segmented into similar
>> groups
>>>>>> such as those that buy high end goods vs low end.
>>>>>>
>>>>>> _Dan
>>>>>>
>>>>>> On 11/22/13 11:04 AM, "Antony Adopo" <saius1er@gmail.com> wrote:
>>>>>>
>>>>>>> Ok. thanks for answering very quickly
>>>>>>>
>>>>>>> I forgot that to mention in the customer table there is a "job"
>>>> variable
>>>>>>> and implicitly, I thought taht this variable will be also need
for
>>>>>>> accurate
>>>>>>> recommendations. anyway
>>>>>>>
>>>>>>> I have around 200 000 customers
>>>>>>> My order table is around 12 000 000 orders
>>>>>>> and I have around 2 000 000 distincts (customerid,itemid) tuples
>>>>>>> About (customerID,itemID) tuples, when I read Mahout or recommender
>>>>>>> system
>>>>>>> litterature, they use
>>>>>>> (customerID,itemID,*preference*) and I don't have *preference.*
>>>>>>> So exist an Mahout method or class that handle only
>> (customerID,itemID)
>>>>>>> data?
>>>>>>> And it is possible to use external data as job or (RFM ) analysis
to
>>>> get
>>>>>>> something more accurate?
>>>>>>>
>>>>>>> Sorry (it's about 2 weeks, I have headache how organize all of
this
>> to
>>>>>>> build a great system). Propose your solutions and after, we'll
see
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> about
>>>>>>>
>>>>>>>
>>>>>>> 2013/11/22 Sebastian Schelter <ssc.open@googlemail.com>
>>>>>>>
>>>>>>>> Hi Antony,
>>>>>>>>
>>>>>>>> I would start with a simple approach: extract all customerID,itemID
>>>>>>>> tuples from the orders table and use them as your input data.
How
>> many
>>>>>>>> of those do you have? The datasize will dictate whether you
need to
>>>>>>>> employ a distributed approach to recommendation mining or
not.
>>>>>>>>
>>>>>>>> --sebastian
>>>>>>>>
>>>>>>>> On 22.11.2013 19:21, Antony Adopo wrote:
>>>>>>>>> Morning,
>>>>>>>>>
>>>>>>>>> My name is Antony and I have a great recommender system
to build
>>>>>>>>>
>>>>>>>>> I'm totally new on recommender systems. After reading
all
>> scientific
>>>>>>>> files,
>>>>>>>>> I didn't find relevant information to build mine.
>>>>>>>>>
>>>>>>>>> ok, my problem:
>>>>>>>>>
>>>>>>>>> I have to build a recommender systems for a retail industry
which
>>>> sold
>>>>>>>>> Building products
>>>>>>>>>
>>>>>>>>> I don't have Explicit data (ratings)
>>>>>>>>>
>>>>>>>>> I have only data about purchases and all transactions
and order and
>>>>>>>> dates.
>>>>>>>>> as
>>>>>>>>>
>>>>>>>>> Orders table
>>>>>>>>>
>>>>>>>>> CustomerID
>>>>>>>>> Sales_ID
>>>>>>>>> Item_ID
>>>>>>>>> Dates
>>>>>>>>> Amount
>>>>>>>>> quantity
>>>>>>>>> channel_type (phone, mail,etc.)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I have also specific informations about users
>>>>>>>>>
>>>>>>>>> Users table
>>>>>>>>> CustomerID
>>>>>>>>> Group (engaged, frequent,buyer, newyer, etc.)
>>>>>>>>>
>>>>>>>>> ... and product
>>>>>>>>>
>>>>>>>>> Item_ID
>>>>>>>>> Item_name
>>>>>>>>> Iteem_parent (hierarchy)
>>>>>>>>>
>>>>>>>>> I don't know how to use all these informations with mahout
(or
>> others
>>>>>>>> tools
>>>>>>>>> or method) to do a good recommendation system (all presents
are
>> based
>>>>>>>> on
>>>>>>>>> ratings and all mahout systems I have seen are also based
on
>> ratings
>>>>>>>> or
>>>>>>>>> preference)
>>>>>>>>>
>>>>>>>>> At beginning, I thought that I have to use classical
datamining
>>>>>>>> methods
>>>>>>>> as
>>>>>>>>> Clustering or association rules but accurately recommanding
n
>>>> products
>>>>>>>>> between  2000 products  clustering in about 300 hierachical
>>>>>>>> parents(not
>>>>>>>>> linked to domain) become difficult with classical data
mining
>>>>>>>>> It is the reason that I turn myself to recommender system
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> please Help
>>>>>>>>> thanks
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>>
>>>>
>>>> --
>>>> Manuel Blechschmidt
>>>> M.Sc. IT Systems Engineering
>>>> Dortustr. 57
>>>> 14467 Potsdam
>>>> Mobil: 0173/6322621
>>>> Twitter: http://twitter.com/Manuel_B
>>>>
>>>>
>>>
>>
>>
> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message