mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 林伟 <linwe...@gmail.com>
Subject Re: Item Based Collaborative Filtering Properties Question
Date Thu, 12 Sep 2013 15:06:17 GMT
Hi Brain,

The parameter "maxPrefsPerUserInItemSimilarity"  is in RecommenderJob, from
the text of comment,  It is the same as the paramter "maxPrefsPerUser "  in
ItemSimilarityJob.

The second question is not easy to answer. It is decided by your
recommendation scenario and input data features. The most important is the
quality of you data (for example , the accuracy of prefer value), not these
parameters. These parameters are more relate to the performance of
similarity calculation.

Thanks.


2013/9/12 Brian Arnold <barnold4238@gmail.com>

> Hi,
>
> Thank you for the response!  What you said makes sense.  Here is a link to
> the other property:
>
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.mahout/mahout-core/0.6/org/apache/mahout/cf/taste/hadoop/item/RecommenderJob.java#RecommenderJob.0DEFAULT_MAX_SIMILARITIES_PER_ITEM
>
> Supposing I have a sufficiently large cluster to process the data, would
> increasing the values necessarily give me a better recommendation?  Which
> do you feel would have the largest impact on the quality of the
> recommendation?
>
> Brian
>
>
> On Thu, Sep 12, 2013 at 7:05 AM, 林伟 <linwei85@gmail.com> wrote:
>
> > Hi Brian *& *Miliauskas,
> >
> > I am a data mining engineer form Taobao recommendation team. In past one
> > month, I have read all the code of mahout itemCF.
> > So maybe I can answer this question.
> >
> > We consider the input of itemCF for one user is a item vector, like this
> > (the notation is from Json object model):
> > <userid,  [ {item1, perf(u, i1)}, {item2, perf(u, i2)}, ..... {itemN,
> > perf(u, in)} ]>
> > So,  maxPrefsPerUser  means max length of item vector. If
> > user preferred more than this number items, there a sample will be
> applied
> > the make sure the limitation.
> >
> > We also consider the output of ItemCF for one item is a similarity
> vector,
> >  like this:
> > <item1,  [ {item2, sim(2,1}, {item3, sim(3,1), .... {itemK, sim(K,1)} ]>
> > So, maxSimilaritiesPerItem  means max length of similarity vector,  if
> > item1 has more similar items than this number, mahout just output top
> > 'maxSimilaritiesPerItem'
> >  items.
> >
> > For parameter 'maxPrefsPerUserItemSimilarity',  I haven't find it.  Can
> you
> > give me a link to find it.
> >
> > Thanks
> >
> >
> >
> > 2013/9/12 Darius Miliauskas <dariui.miliauskui@gmail.com>
> >
> > > Hi, Brian,
> > >
> > > this question is also relevant for me. Perhaps somebody will give more
> > > details because I am just learning myself. But, I guess you can try to
> > > change the parameters, and check the performance, and write here about
> it
> > > that everybody would get more knowledge!
> > >
> > > In general, if these values are lower, the performance should be faster
> > > because mahout based on some algorithms of hadoop. I think it could
> help
> > if
> > > you will try the algorithms with several pieces of data, and look if
> you
> > > are missing some important recommendations. Let's say if you choose "
> > > maxSimilaritiesPerItem" as 4, and you miss some recommendations, then
> you
> > > should increase the value. It is a balance between performance and
> better
> > > results, and you should find that balance. Hope, you to share more
> > details
> > > about what you will find out because I noticed that here (in the
> mailing
> > > list of mahout) everybody is asking but only few replying, and sharing.
> > >
> > >
> > > Thanks,
> > >
> > > Darius
> > >
> > >
> > > 2013/9/12 Brian Arnold <barnold4238@gmail.com>
> > >
> > > > Hi,
> > > >
> > > > I am currently trying to run the distributed Item Based Collaborative
> > > > filtering algorithm on our Hadoop cluster, and I have a few questions
> > > > regarding tweaking the various properties of the algorithm.  For the
> > > > maxPrefsPerUser,maxSimilaritiesPerItem, and
> > maxPrefsPerUserItemSimilarity
> > > > properties I was wondering if I could get a more detailed explanation
> > of
> > > > what these properties control.  I saw the description in the code,
> but
> > I
> > > am
> > > > just wondering how changing these values will affect the results of
> the
> > > > algorithm, and will increasing them result in a better
> recommendation.
> > > >
> > > > Thanks
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message