mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <>
Subject Re: How create a recommended system?
Date Fri, 21 Sep 2012 15:06:07 GMT
It's sometimes difficult to define the recommendation problem, given a
lot of possible data to work with.

I take it that you are trying to recommend activities to people here.
But, you're also talking a lot about computing similarities between
people to cluster, and between activities. Do you need these as
output, or do you think you need them as part of a recommendation
process (you don't necessarily)?

I think you mean that you want a similarity function between
activities rather than distance function. Just create a DataModel with
your user-activity data, and then use any ItemSimilarity
implementation to find similarity between any pair of activities.

Clustering users is a different problem entirely. If you just want
user-user similarity, you can similarly use the same data and a
UserSimilarity function to find any user-user similarity. But, you are
talking about using user features to compute this similarity. This
isn't a collaborative filtering problem anymore, then.

maybe you can clarify first what you want to do.

On Fri, Sep 21, 2012 at 1:48 PM, kostas_new <> wrote:
> Hello,
> I am programming a recommendation system in terms of a course project in
> order to propose activities for a specific person.
> I have installed mahout and handoop in order to succeed that.
> The attributes which enroll important role in the recommendation system are
> the followings:
> 1) all the attributes for each one person (e.g. age, gender, his/her
> preferences in different types of activity)
> 2) The core activities (type of activities, target_group(numerical
> attribute))
> The numeric attribute it is not a problem because is only a number. As a
> result i would I would like to declare a "distance function" between the
> different activities, for example the relationship distance between the
> football and basketball should be strong, because are parts of the sports
> category. Otherwise the distance between basketball and opera should be
> larger. Except of the unique characterization of each one of the activities,
> I would prefer to characterize each activity by many types, for example
> activity X -> is an opera with education character. That is a multi
> dimension of nominal attributes.
>  )
> <>
> *Question*
> How do you recommend me to implement the recommendation algorithm process?!
> (*Q1*)
> One the one hand, I am thinking to do the clustering for the users. The
> clustering must take under consideration for example the age, e.g. 33 years
> old , and the preferences, for example the user 1023 prefers to go to B1
> (activity type = B2), in that point the creation of the vector is a headache
> for me because I have to measure the distance between the different
> activities, counting as well the user’s preferences.(*Q2*).
> On the other hand because I know the exact number and features of the
> activities I don’t think that it is needed to implement a clustering for
> them.
> As the last step, I want to use the collaborative filtering for my
> recommendations. For example the input table will follow the format:
> User_id         activity_id     Preference (0..1)
> 100              500            0.5
> 200              300            0.9
> I know that I could use this table only for my recommendations, but I want
> to take advantage on the user preferences and of the dependencies between
> the different types of activities which hypothetically could be siblings.
> Thank you very much for your time. Unfortunately I have spent many months in
> order to find a solution.
> --
> View this message in context:
> Sent from the Mahout User List mailing list archive at

View raw message