On Mar 25, 2013 4:15 AM, "Ted Dunning" <ted.dunning@gmail.com> wrote:
>
> Well, actually, you can.
>
> LSI does exactly that.
>
> What the effect is of doing this is not clear to me. Do you know what
> happens if you assume missing values are 0?
I thought about that . My understanding is that with lsi outting zero is
not lack of observation, it is actually an observation that the tern has
not been used in the document which obwerves it as irrelevant to the
document.
With cf, if you put zero (or, rather, an average of the entries in the
ovservation vector) it is a too severe if a patch for a lack of ovservation.
Typically such patching has terrible effectw in practice and has an effect
similar to lack of learning rat or training, or overregularized result.
Indeed, lets assume my average movie rating is 3. And i am into action
movies so perhaps i rated all good action movies as 5. But ionly warched
100 action movies and netfkix has 10000 such similar movies, so the matrix
input is filled to assume that i have a rating of 5 for 100 and 3 for 9900.
That is it is still gives me barely a nudge when we try to solve a spase
problem with ssvd.
Classic als wr is bypassing underlearning problem by cutting out unrated
entries from linear equations system. It also still has a fery defined
regularization technique which allows to find optimal fit in theory (but
still not in mahout, not without at least some additional sweat, i heard).
D
>
> On Mon, Mar 25, 2013 at 12:10 PM, Sebastian Schelter <ssc@apache.org>
wrote:
>
> > I think one crucial point is missing from this discussion. In
> > Collaborative Filtering the matrix to decompose is only partially
> > defined. A missing entry cannot simply be substituted by 0, it is
> > unknown. Therefore, you cannot use standard SVD techniques as you would
> > use for LSI for example.
> >
> > On 25.03.2013 11:46, Sean Owen wrote:
> > > Points from across several emails 
> > >
> > > The initial itemfeature matrix can be just random unit vectors too. I
> > > have slightly better results with that.
> > >
> > > You are finding the leastsquares solution of A = U M' for U given A
> > > and M. Yes you can derive that analytically as the zero of the
> > > derivative of the error function.
> > >
> > > With m users and n items, and k features, where k=n, then I suppose
> > > you don't need any iterations at all since there is a trivial
> > > solution: U = A, M = I(n) (the nxn identity matrix). You would not
> > > find this on the first iteration, however, if you followed the
> > > algorithm, because you would be starting from some random starting
> > > point. But if you initialized M to the identity matrix, yes you'd find
> > > the exact solution immediately.
> > >
> > > Yes it is an iterative algorithm and you have to define a convergence
> > > criterion. I use average absolute difference in (U M') entries from
> > > one iteration to the next. (Well, a sample.) It's possible that you
> > > reach your criterion in 1 iteration, or, not. It depends on the
> > > criterion. Usually when you restart ALS on updated data, you use the
> > > previous M matrix as a starting point. If the change in data is small,
> > > one iteration should usually leave you still "converged" actually.
> > > But, from random starting point  not typical.
> > >
> > > ALS is similar to SVD only in broad terms. The SVD is not always used
> > > to make a lowrank factorization, and, its outputs carry more
> > > guarantees  they are orthonormal bases because it has factored out
> > > scaling factors into the third matrix Sigma. I think of ALS as more
> > > simplistic and therefore possibly faster. WIth k features I believe
> > > (?) the SVD is necessarily a kiteration process at least, whereas ALS
> > > might be of use after 1 iteration. The SVD is not a "shortcut" for
> > > ALS. If you go to the trouble of a full SVD, you can certainly use
> > > that factorization asis though.
> > >
> > > You need regularization.
> > >
> > >
> > > It should be pointed out that the "ALS" often spoken of here is not
> > > one that factors the input matrix A. There's a variant, that I have
> > > had good results with, for 'implicit' feedback. There, you are
> > > actually factoring the matrix P = (1 : A != 0, 0 : A == 0), and using
> > > values in A as weights in the loss function. You're reconstructing
> > > "interacted or not" and using input value as a confidence measure.
> > > This "works" for ratings although the interpretation in this variant
> > > doesn't line up with the nature of ratings. It works quite nicely for
> > > things like clicks, etc.
> > >
> > > (Much more can be said on this point.)
> > >
> > >
> > >
> > > On Mon, Mar 25, 2013 at 2:19 AM, Dominik Huebner <contact@dhuebner.com
>
> > wrote:
> > >> It's quite hard for me to get the mathematical concepts of the ALS
> > >> recommenders. It would be great if someone could help me to figure
out
> > >> the details. This is my current status:
> > >>
> > >> 1. The itemfeature (M) matrix is initialized using the average
ratings
> > >> and random values (explicit case)
> > >>
> > >> 2. The userfeature (U) matrix is solved using the partial
derivative of
> > >> the error function with respect to u_i (the columns of rowvectors
of U)
> > >>
> > >> Supposed we use as many features as items are known and the error
> > >> function does not use any regularization. Would U be solved within
the
> > >> first iteration? If not, I do not understand why more than one
iteration
> > >> is needed.
> > >> Furthermore, I believe to have understood that using fewer features
than
> > >> items and also applying regularization, does not allow to solve U in
a
> > >> way that the stopping criterion can be met after only one iteration.
> > >> Thus, iteration is required to gradually converge to the stopping
> > >> criterion.
> > >>
> > >> I hope I have pointed out my problems clearly enough.
> > >>
> >
> >
