mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sebastian Schelter <...@apache.org>
Subject Re: Item Based Collaborative Filtering Properties Question
Date Thu, 12 Sep 2013 16:01:31 GMT
Hi Brian,

Happy to give you some details:
So, from a matrix A (user x item) that holds user-item interactions,
this algorithm first computes a matrix S (item x item) of item
similarities and afterwards uses these item similarities to compute
recommendations for users.

the parameters refer to the following:

'maxPrefsPerUserInItemSimilarity' the maximum number of interactions per
user to take into account when computing S (e.g. the maximum number of
entries to look at per row in A, selected at random). Single power-users
with an anomalous number of interactions can heavily increase the
computation time, without contributing to the actual quality of the
output. Setting this to something like 500 should give you reasonable
performance and results.

'maxSimilaritiesPerItem' this number determines the maximum number of
similar items to look at per item (e.g. the maximum number of entries
per row in S). Research papers reported good results with something
between 20 and 100.

'maxPrefsPerUser': this number determines how many interactions per user
to take into account in the final recommendation phase. This thing is
probably bugged and should be set to a very high number (as large as the
maximum number of interactions per user or larger) otherwise you might
see items in the recommendations that the user already knows.

In general, the only way to get a picture of the quality of a
recommender is by doing tests in a live system with real users. You can
of course do some hold-out tests or cross-validation offline, but good
performance there does not necessarily correlate with good performance
in a real system.

I suggest you start by using the default values, do you use trunk or 0.8?

Best,
Sebastian



2013/9/11 Brian Arnold <barnold4238@gmail.com>

> Hi,
>
> I am currently trying to run the distributed Item Based Collaborative
> filtering algorithm on our Hadoop cluster, and I have a few questions
> regarding tweaking the various properties of the algorithm.  For the
> maxPrefsPerUser,maxSimilaritiesPerItem, and maxPrefsPerUserItemSimilarity
> properties I was wondering if I could get a more detailed explanation of
> what these properties control.  I saw the description in the code, but I am
> just wondering how changing these values will affect the results of the
> algorithm, and will increasing them result in a better recommendation.
>
> Thanks
>



Mime
View raw message