Ok, i got the idea of post clustering the projections to introduce
nonlinearity or some other magic effects. I would wonder though whats
favourable, directly having all factors to have a more sparse feature
vector or pre clustering. It probably depends :)
Thanks for the feedback Ted!
I will continue my quest how to construct a ctr prediction for a
recommendation delivery. Maybe I should have pointed that goal out before.
On Fri, Oct 5, 2012 at 11:36 PM, Ted Dunning <ted.dunning@gmail.com> wrote:
> On Fri, Oct 5, 2012 at 4:57 PM, Johannes Schulte <
> johannes.schulte@gmail.com
> > wrote:
>
> > Hi Ted,
> >
> > thanks for the hints. I am however wondering what the reverse projection
> > would be needed for. Do you mean for explaining stuff only? Or
> validating a
> > model manually?
> >
>
> Or for converting recommendations back to items.
>
>
> > Also, your idea is to first reduce the dimensionality via random
> projection
> > (as opposed to matrix factorization??) and then do a clustering in the
> new
> > space to derive features, right?
> >
>
> Well, a good random projection *is* roughly equivalent to part of a matrix
> factorization but other than that nit, you are correct.
>
>
> > Can you point out how that would be different from using a svd reduction
> as
> > a feature generation technique? If I got it right it's for scalability /
> > performance reasons, right?
> >
>
> I really have no idea if this would make any major difference. The
> theoretical difference is that the cluster distance transform is nonlinear
> which might help with some things.
>
>
> > I got the feeling that for me it's easier to start with a simple KMeans
> > code and tweak that if you got only a single machine at hand. All the
> > nondistributed MF algorithms are either slow or not really suited for
> > binary data, if i get everything right. With kMeans i can avoid my
> biggest
> > factor (users).
> >
>
> With kmeans you either expose items and average over users to get clusters
> of items (similar to itembased operations) or you build clusters of users
> based on their item history. This dichotomy is exactly equivalent to the
> similar choices with conventional recommenders.
>
>
> > I'm really looking forward to the streaming kmeans stuff!
> >
>
> me too. Need to get it finished.
>
