It's quite hard for me to get the mathematical concepts of the ALS
recommenders. It would be great if someone could help me to figure out
the details. This is my current status:
1. The itemfeature (M) matrix is initialized using the average ratings
and random values (explicit case)
2. The userfeature (U) matrix is solved using the partial derivative of
the error function with respect to u_i (the columns of rowvectors of U)
Supposed we use as many features as items are known and the error
function does not use any regularization. Would U be solved within the
first iteration? If not, I do not understand why more than one iteration
is needed.
Furthermore, I believe to have understood that using fewer features than
items and also applying regularization, does not allow to solve U in a
way that the stopping criterion can be met after only one iteration.
Thus, iteration is required to gradually converge to the stopping
criterion.
I hope I have pointed out my problems clearly enough.
