ALS is a way of solving an matrix equivalent of an regular linear least
squares problem. In a regular least squares problem we solve for Ax=b,
i.e., b  Ax in l2 norm. In the case of ALS we try to solve R  UM 
in frobenius norm. Now we try to reduce R  UM, to the kind b  Ax,
by solving for each rows of U in a regular least squares sense.
Example:
Let us consider the first row of R, which will have all the ratings given
by the
fist user. Let us assume the user has seen movies 1, 213, 315, 1744, 2133,
7044 and
12344, so in a total of 7 movies. Forming a vector of the known ratings for
the first
user we have [1 213 315 1744 2133 7044 12344] , let this be denoted by b.
We initialize the M matrix in a particular way, and as shown in the figure
above
it has n_m(total number of movies) rows and n_f(reduced factors) columns.
The matrix U has n_u(total number of users) rows and n_f columns. For the
case of the first user we can form a submatrix of M , denote this by M_u .
This matrix M_u will have all the rows from M corresponding to the movies
seen by the user u, the first user in this case. Let u be the row vector
from U corresponding to the considered user, .i.e, first user. Now we have
three components b,M_u and u, so we have a regular least squares problem
for b  Mu. This is done for all the users, hence forming the first
estimate of the U matrix, since this cant be the final U we need to
alternatively continue to solve for U and M.
You could try to just do an SVD on R, and initialize M=S*V', and use this M
in ALS, in this case you might not need many iterations as we dont
initialize with random values in M. Any approximation method involving
random initial values needs a certain minimum number of iterations to tend
towards the right values, as we in each iteration minimize the error to a
certain extent only. We cant reduce the whole error in just one iteration.
I am also new to ALS, this is my initial understanding.
On Mon, Mar 25, 2013 at 6:47 AM, Ted Dunning <ted.dunning@gmail.com> wrote:
> If you have an observation matrix that actually is low rank and you start
> with the exact value of U, then one iteration on M will suffice to solve
> the system.
>
> Likewise, if you have M, one iteration on U will suffice.
>
> This holds regardless of the number of features of M or U (which must
> match, btw) as long as the rank of the observation matrix is equal or less
> to that number of features.
>
> Otherwise, more than one iteration is likely to be required.
>
> On Mon, Mar 25, 2013 at 3:19 AM, Dominik Huebner <contact@dhuebner.com
> >wrote:
>
> > It's quite hard for me to get the mathematical concepts of the ALS
> > recommenders. It would be great if someone could help me to figure out
> > the details. This is my current status:
> >
> > 1. The itemfeature (M) matrix is initialized using the average ratings
> > and random values (explicit case)
> >
> > 2. The userfeature (U) matrix is solved using the partial derivative of
> > the error function with respect to u_i (the columns of rowvectors of U)
> >
> > Supposed we use as many features as items are known and the error
> > function does not use any regularization. Would U be solved within the
> > first iteration? If not, I do not understand why more than one iteration
> > is needed.
> > Furthermore, I believe to have understood that using fewer features than
> > items and also applying regularization, does not allow to solve U in a
> > way that the stopping criterion can be met after only one iteration.
> > Thus, iteration is required to gradually converge to the stopping
> > criterion.
> >
> > I hope I have pointed out my problems clearly enough.
> >
> >
>

Best regards,
Abhijith
