mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Johannes Schulte <johannes.schu...@gmail.com>
Subject Re: K-Means as a surrogate for Matrix Factorization
Date Sun, 07 Oct 2012 09:41:21 GMT
Ok, i got the idea of post clustering the projections to introduce
non-linearity or some other magic effects. I would wonder though whats
favourable, directly having all factors to have a more sparse feature
vector or pre clustering. It probably depends :)

Thanks for the feedback Ted!

I will continue my quest how to construct a ctr prediction for a
recommendation delivery. Maybe I should have pointed that goal out before.

On Fri, Oct 5, 2012 at 11:36 PM, Ted Dunning <ted.dunning@gmail.com> wrote:

> On Fri, Oct 5, 2012 at 4:57 PM, Johannes Schulte <
> johannes.schulte@gmail.com
> > wrote:
>
> > Hi Ted,
> >
> > thanks for the hints. I am however wondering what the reverse projection
> > would be needed for. Do you mean for explaining stuff only? Or
> validating a
> > model manually?
> >
>
> Or for converting recommendations back to items.
>
>
> > Also, your idea is to first reduce the dimensionality via random
> projection
> > (as opposed to matrix factorization??) and then do a clustering in the
> new
> > space to derive features, right?
> >
>
> Well, a good random projection *is* roughly equivalent to part of a matrix
> factorization but other than that nit, you are correct.
>
>
> > Can you point out how that would be different from using a svd reduction
> as
> > a feature generation technique? If I got it right it's for scalability /
> > performance reasons, right?
> >
>
> I really have no idea if this would make any major difference.  The
> theoretical difference is that the cluster distance transform is non-linear
> which might help with some things.
>
>
> > I got the feeling that for me it's easier to start with a simple KMeans
> > code and tweak that if you got only a single machine at hand. All the
> > non-distributed MF algorithms are either slow or not really suited for
> > binary data, if i get everything right. With k-Means i can avoid my
> biggest
> > factor (users).
> >
>
> With k-means you either expose items and average over users to get clusters
> of items (similar to item-based operations) or you build clusters of users
> based on their item history.  This dichotomy is exactly equivalent to the
> similar choices with conventional recommenders.
>
>
> > I'm really looking forward to the streaming k-means stuff!
> >
>
> me too.  Need to get it finished.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message