On Mon, Mar 25, 2013 at 1:41 PM, Koobas <koobas@gmail.com> wrote:
>> But the assumption works nicely for clicklike data. Better still when
>> you can "weakly" prefer to reconstruct the 0 for missing observations
>> and much more strongly prefer to reconstruct the "1" for observed
>> data.
>>
>
> This does seem intuitive.
> How does the benefit manifest itself?
> In lowering the RMSE of reconstructing the interaction matrix?
> Are there any indicators that it results in better recommendations?
> Koobas
In this approach you are no longer reconstructing the interaction
matrix, so there is no RMSE vs the interaction matrix. You're
reconstructing a matrix of 0 and 1. Because entries are weighted
differently, you're not even minimizing RMSE over that matrix  the
point is to take some errors more seriously than others. You're
minimizing a *weighted* RMSE, yes.
Yes of course the goal is better recommendations. This broader idea
is harder to measure. You can use mean average precision to measure
the tendency to predict back interactions that were held out.
Is it better? depends on better than *what*. Applying algorithms that
treat input like ratings doesn't work as well on clicklike data. The
main problem is that these will tend to pay too much attention to
large values. For example if an item was clicked 1000 times, and you
are trying to actually reconstruct that "1000", then a 10% error
"costs" (0.1*1000)^2 = 10000. But a 10% error in reconstructing an
item that was clicked once "costs" (0.1*1)^2 = 0.01. The former is
considered a million times more important errorwise than the latter,
even though the intuition is that it's just 1000 times more important.
Better than algorithms that ignore the weight entirely  yes probably
if only because you are using more information. But as in all things
"it depends".
