spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <so...@cloudera.com>
Subject Re: implicit ALS dataSet
Date Thu, 19 Jun 2014 17:17:44 GMT
On Thu, Jun 19, 2014 at 3:44 PM, redocpot <julien19890118@gmail.com> wrote:
> As the paper said, the low ratings will get a low confidence weight, so if I
> understand correctly, these dominant one-timers will be more *unlikely* to
> be recommended comparing to other items whose nbPurchase is bigger.

Correct, yes.


> In fact, lambda is also considered as a potential problem, as in our case,
> the lambda is set to 300, which is confirmed by the test set. Here is test
> result :

Although people use lambda to mean different things in different
places, in every interpretation I've seen, 300 is extremely high :)  1
is very high even.

(alpha = 1 is the lowest value I'd try; it also depends on the data
but sometimes higher values work well. For the data set in the
original paper, they used alpha = 40)


> where EPR_in is given by training set and EPR_out is given by test set. It
> seems 300 is the right lambda, since less overfitting.

I take your point about your results though, hm. Can you at least try
much lower lambda? I'd have to think and speculate about why you might
be observing this effect but a few more data points could help. It may
be that you've forced the model into basically recommending globally
top items, and that does OK as a local minimum, but personalized
recommendations are better still, with a very different lambda.

Also you might consider holding out the most-favored data as the test
data. It biases the test a bit, but at least you are asking whether
the model ranks highly things that are known to rank highly, rather
than any old thing the user interacted with.

Mime
View raw message