Yeah I've turned that over in my head. I am not sure I have a great
answer. But I interpret the net effect to be that the model prefers
simple explanations for active users, at the cost of more error in the
approximation. One would rather pick a basis that more naturally
explains the data observed in active users. I think I can see that
this could be a useful assumption  these users are less extremely
sparse.
On Mon, Jun 16, 2014 at 8:50 PM, Dmitriy Lyubimov <dlieu.7@gmail.com> wrote:
> Probably a question for Sebastian.
>
> As we know, the two papers (HuKorenVolynsky and Zhou et. al) use slightly
> different loss functions.
>
> Zhou et al. are fairly unique in that they multiply norm of U, V vectors
> additionally by the number of observied interactions.
>
> The paper doesn't explain why it works except saying along the lines of "we
> tried several regularization matrices, and this one worked better in our
> case".
>
> I tried to figure why that is. And still not sure why it would be better.
> So b asically we say, by allowing smaller sets of observation having
> smaller regularization values, it is ok for smaller observation sets to
> overfit slightly more than larger observations sets.
>
> This seems to be counterintuitive. Intuition tells us, smaller sets
> actually would tend to overfit more, not less, and therefore might possibly
> use larger regularization rate, not smaller one. Sebastian, what's your
> take on weighing regularization in ALSWR?
>
> thanks.
> d
