mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gokhan Capan <gkhn...@gmail.com>
Subject Re: SGD Based Recommender Contribution Proposal
Date Sun, 09 Sep 2012 14:01:19 GMT
On Fri, Sep 7, 2012 at 12:48 AM, Ted Dunning <ted.dunning@gmail.com> wrote:

> This sounds pretty exciting.  Beyond that, it is hard to say much.
>
> Can you say a bit more about how you would see introducing the code into
> Mahout?
>

Ted, I've forked apache/mahout at github, and I will merge the library into
mahout. I believe in a week I will be able to add documentation and mahout
jobs for experiments and start submitting patches to JIRA.


> On Thu, Sep 6, 2012 at 9:14 AM, Gokhan Capan <gkhncpn@gmail.com> wrote:
>
> > By the way, I want to mention that my thesis is advised by Ozgur
> Yilmazel,
> > who is a founding member of the Mahout project. I conducted this study
> and
> > kept the implementation integrable to Mahout with his guidance.
> >
> > On Thu, Sep 6, 2012 at 6:04 PM, Gokhan Capan <gkhncpn@gmail.com> wrote:
> >
> > > Dear Mahout community,
> > >
> > > I would like to introduce a set of tools for recommender systems those
> > are
> > > implemented as a part of my MSc. thesis. This is inspired by our
> > > conversations in the user-list, and I tried to stick it to existing
> Taste
> > > framework for possible contribution to Mahout.
> > >
> > > The library is available at github.com/gcapan/recommender<
> > http://github.com/gcapan>.
> > >
> > >
> > > The library contains Stochastic Gradient Descent based learning
> > algorithms
> > > for Matrix Factorization based recommendation.
> > >
> > > Core features of the library are listed below:
> > >
> > > 1-  It handles different recommendation targets (feedback), namely;
> > >     - Standard numerical recommendation with OLS Regression
> > >     - Binary recommendation with Logistic Regression
> > >     - Multinomial recommendation with Softmax Regression
> > >     - Ordinal recommendation with Proportional Odds Model
> > >     - Predicting counts with Poisson Regression (still experimental)
> > >
> > > 2- It may use side information from users and items if available
> > >
> > > 3- It may leverage the dynamic side information (this is what I called
> > > it), which means the features whose values are determined at feedback
> > time
> > > (e.g. day of week for possible effect on people's choices, proximity
> for
> > > location aware recommendation, etc.)
> > >
> > > 4- It is an online learning algorithm thus scalable. However, currently
> > > the model is stored in memory. I plan to extend it to store the model
> in
> > > HBase, too.
> > >
> > >
> > > The recommenders implement the Mahout's Recommender interface. For
> > > experiments, I have implemented a GenericIncrementalDataModel (in
> > memory),
> > > and List based PreferenceArrays.
> > >
> > > I tried to use Mahout's data structures where available. For example,
> > > factor vectors and side info vectors are in Mahout's vector format.
> > >
> > > These algorithms are highly inspired by various influential Recommender
> > > System papers, especially from Yehuda Koren. For example, the Ordinal
> > model
> > > is from Koren's OrdRec paper, except the cuts are not user-specific but
> > > global.
> > >
> > > I tried the numerical recommender on MovieLens-1M dataset, and it
> > achieved
> > > around 0.851 RMSE with 150 factors and 30 iterations.
> > >
> > > The code is tested, but not fully documented.
> > >
> > > With some effort, the code can be integrated into Mahout. If it has a
> > > potential to be beneficial for Mahout users, I will be happy to
> > contribute
> > > it to ASF with your guidance.
> > >
> > > Any feedback is appreciated.
> > >
> > > Regards
> > >
> > > --
> > > Gokhan
> >
> >
> >
> >
> > --
> > Gokhan
> >
>



-- 
Gokhan

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message