mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: SGD Based Recommender Contribution Proposal
Date Thu, 06 Sep 2012 21:48:39 GMT
This sounds pretty exciting.  Beyond that, it is hard to say much.

Can you say a bit more about how you would see introducing the code into
Mahout?

On Thu, Sep 6, 2012 at 9:14 AM, Gokhan Capan <gkhncpn@gmail.com> wrote:

> By the way, I want to mention that my thesis is advised by Ozgur Yilmazel,
> who is a founding member of the Mahout project. I conducted this study and
> kept the implementation integrable to Mahout with his guidance.
>
> On Thu, Sep 6, 2012 at 6:04 PM, Gokhan Capan <gkhncpn@gmail.com> wrote:
>
> > Dear Mahout community,
> >
> > I would like to introduce a set of tools for recommender systems those
> are
> > implemented as a part of my MSc. thesis. This is inspired by our
> > conversations in the user-list, and I tried to stick it to existing Taste
> > framework for possible contribution to Mahout.
> >
> > The library is available at github.com/gcapan/recommender<
> http://github.com/gcapan>.
> >
> >
> > The library contains Stochastic Gradient Descent based learning
> algorithms
> > for Matrix Factorization based recommendation.
> >
> > Core features of the library are listed below:
> >
> > 1-  It handles different recommendation targets (feedback), namely;
> >     - Standard numerical recommendation with OLS Regression
> >     - Binary recommendation with Logistic Regression
> >     - Multinomial recommendation with Softmax Regression
> >     - Ordinal recommendation with Proportional Odds Model
> >     - Predicting counts with Poisson Regression (still experimental)
> >
> > 2- It may use side information from users and items if available
> >
> > 3- It may leverage the dynamic side information (this is what I called
> > it), which means the features whose values are determined at feedback
> time
> > (e.g. day of week for possible effect on people's choices, proximity for
> > location aware recommendation, etc.)
> >
> > 4- It is an online learning algorithm thus scalable. However, currently
> > the model is stored in memory. I plan to extend it to store the model in
> > HBase, too.
> >
> >
> > The recommenders implement the Mahout's Recommender interface. For
> > experiments, I have implemented a GenericIncrementalDataModel (in
> memory),
> > and List based PreferenceArrays.
> >
> > I tried to use Mahout's data structures where available. For example,
> > factor vectors and side info vectors are in Mahout's vector format.
> >
> > These algorithms are highly inspired by various influential Recommender
> > System papers, especially from Yehuda Koren. For example, the Ordinal
> model
> > is from Koren's OrdRec paper, except the cuts are not user-specific but
> > global.
> >
> > I tried the numerical recommender on MovieLens-1M dataset, and it
> achieved
> > around 0.851 RMSE with 150 factors and 30 iterations.
> >
> > The code is tested, but not fully documented.
> >
> > With some effort, the code can be integrated into Mahout. If it has a
> > potential to be beneficial for Mahout users, I will be happy to
> contribute
> > it to ASF with your guidance.
> >
> > Any feedback is appreciated.
> >
> > Regards
> >
> > --
> > Gokhan
>
>
>
>
> --
> Gokhan
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message