mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: K-Means as a surrogate for Matrix Factorization
Date Fri, 05 Oct 2012 21:36:50 GMT
On Fri, Oct 5, 2012 at 4:57 PM, Johannes Schulte <johannes.schulte@gmail.com
> wrote:

> Hi Ted,
>
> thanks for the hints. I am however wondering what the reverse projection
> would be needed for. Do you mean for explaining stuff only? Or validating a
> model manually?
>

Or for converting recommendations back to items.


> Also, your idea is to first reduce the dimensionality via random projection
> (as opposed to matrix factorization??) and then do a clustering in the new
> space to derive features, right?
>

Well, a good random projection *is* roughly equivalent to part of a matrix
factorization but other than that nit, you are correct.


> Can you point out how that would be different from using a svd reduction as
> a feature generation technique? If I got it right it's for scalability /
> performance reasons, right?
>

I really have no idea if this would make any major difference.  The
theoretical difference is that the cluster distance transform is non-linear
which might help with some things.


> I got the feeling that for me it's easier to start with a simple KMeans
> code and tweak that if you got only a single machine at hand. All the
> non-distributed MF algorithms are either slow or not really suited for
> binary data, if i get everything right. With k-Means i can avoid my biggest
> factor (users).
>

With k-means you either expose items and average over users to get clusters
of items (similar to item-based operations) or you build clusters of users
based on their item history.  This dichotomy is exactly equivalent to the
similar choices with conventional recommenders.


> I'm really looking forward to the streaming k-means stuff!
>

me too.  Need to get it finished.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message