Drilling just a bit more.
If I just use simple Tikhonov regularization,
I set both lambdas to identity, and iterate like this (MATLAB):
rank = 50;
for i=1:6,
Y = inv(X'*X+eye(rank))'*X'*A;
X = A*Y'*inv(Y*Y'+eye(rank));
end
Now, can I use weighted regularization and preserve the matrix notation?
Because it seems to me that I have to go one row of X, (one column of Y) at
a time.
Is that really so, or am I missing something?
On Wed, Jan 9, 2013 at 10:13 AM, Koobas <koobas@gmail.com> wrote:
>
>
> On Wed, Jan 9, 2013 at 12:40 AM, Sean Owen <srowen@gmail.com> wrote:
>
>> I think the model you're referring to can use explicit or implicit
>> feedback. It's using the values  however they are derived  as
>> weights in the loss function rather than values to be approximated
>> directly. So you still use P even with implicit feedback.
>>
>> Of course you can also use ALS to factor R directly if you wanted, also.
>>
>> Yes, I see it now.
> It is weighted regression, whether explicit or implicit data.
> Thank you so much.
> I think I finally got the picture.
>
>
>> Overfitting is as much an issue as in any ML algorithm. Hard to
>> quantify it more than that but you certainly don't want to use lambda
>> = 0.
>>
>> The right value of lambda depends on the data  depends even more on
>> what you mean by lambda! there are different usages in different
>> papers. More data means you need less lambda. The effective weight on
>> the overfitting / Tikhonov terms is about 1 in my experience  these
>> terms should be weighted roughly like the loss function terms. But
>> that can mean using values for lambda much smaller than 1, since
>> lambda is just one multiplier of those terms in many formulations.
>>
>> The rank has to be greater than the effective rank of the data (of
>> course). It's also something you have to fit to the data
>> experimentally. For normalish data sets of normalish size, the right
>> number of features is probably 20  100. I'd test in that range to
>> start.
>>
>> More features tends to let the model overfit more, so in theory you
>> need more lambda with more features, all else equal.
>>
>> It's *really* something you just have to fit to representative sample
>> data. The optimal answer is way too dependent on the nature,
>> distribution and size of the data to say more than the above.
>>
>>
>> On Tue, Jan 8, 2013 at 8:54 PM, Koobas <koobas@gmail.com> wrote:
>> >> Okay, I got a little bit further in my understanding.
>> > The matrix of ratings R is replaced with the binary matrix P.
>> > Then R is used again in regularization.
>> > I get it.
>> > This takes care of the situations when you have useritem interactions,
>> > but you don't have the rating.
>> > So, it can handle explicit feedback, implicit feedback, and mixed
>> (partial
>> > / missing feedback).
>> > If I have implicit feedback, I just drop R altogether, right?
>> >
>> > Now the only remaining "trick" is Tikhonov regularization,
>> > which leads to a couple of questions:
>> > 1) How much of a problem overfitting is?
>> > 2) How do I pick lambda?
>> > 3) How do I pick the rank of the approximation in the first place?
>> > How does the overfitting problem depend on the rank of the
>> > approximation?
>>
>
>
