mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: K-Means as a surrogate for Matrix Factorization
Date Sun, 07 Oct 2012 14:54:09 GMT
For reference, the talk I just gave at Oxford has a little bit on data
sparsification in it.  See

http://www.slideshare.net/tdunning/oxford-05oct2012

For most kinds of modeling cluster proximity features outperform the
original variables significantly unless the originals are very carefully
designed.

On Sun, Oct 7, 2012 at 10:41 AM, Johannes Schulte <
johannes.schulte@gmail.com> wrote:

> Ok, i got the idea of post clustering the projections to introduce
> non-linearity or some other magic effects. I would wonder though whats
> favourable, directly having all factors to have a more sparse feature
> vector or pre clustering. It probably depends :)
>
> Thanks for the feedback Ted!
>
> I will continue my quest how to construct a ctr prediction for a
> recommendation delivery. Maybe I should have pointed that goal out before.
>
> On Fri, Oct 5, 2012 at 11:36 PM, Ted Dunning <ted.dunning@gmail.com>
> wrote:
>
> > On Fri, Oct 5, 2012 at 4:57 PM, Johannes Schulte <
> > johannes.schulte@gmail.com
> > > wrote:
> >
> > > Hi Ted,
> > >
> > > thanks for the hints. I am however wondering what the reverse
> projection
> > > would be needed for. Do you mean for explaining stuff only? Or
> > validating a
> > > model manually?
> > >
> >
> > Or for converting recommendations back to items.
> >
> >
> > > Also, your idea is to first reduce the dimensionality via random
> > projection
> > > (as opposed to matrix factorization??) and then do a clustering in the
> > new
> > > space to derive features, right?
> > >
> >
> > Well, a good random projection *is* roughly equivalent to part of a
> matrix
> > factorization but other than that nit, you are correct.
> >
> >
> > > Can you point out how that would be different from using a svd
> reduction
> > as
> > > a feature generation technique? If I got it right it's for scalability
> /
> > > performance reasons, right?
> > >
> >
> > I really have no idea if this would make any major difference.  The
> > theoretical difference is that the cluster distance transform is
> non-linear
> > which might help with some things.
> >
> >
> > > I got the feeling that for me it's easier to start with a simple KMeans
> > > code and tweak that if you got only a single machine at hand. All the
> > > non-distributed MF algorithms are either slow or not really suited for
> > > binary data, if i get everything right. With k-Means i can avoid my
> > biggest
> > > factor (users).
> > >
> >
> > With k-means you either expose items and average over users to get
> clusters
> > of items (similar to item-based operations) or you build clusters of
> users
> > based on their item history.  This dichotomy is exactly equivalent to the
> > similar choices with conventional recommenders.
> >
> >
> > > I'm really looking forward to the streaming k-means stuff!
> > >
> >
> > me too.  Need to get it finished.
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message