spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From redocpot <>
Subject Re: implicit ALS dataSet
Date Thu, 19 Jun 2014 14:44:50 GMT
One thing needs to be mentioned is that, in fact, the schema is (userId,
itemId, nbPurchase), where nbPurchase is equivalent to ratings. I found that
there are many one-timers, which means the pairs whose nbPurchase = 1. The
number of these pairs is about 85% of all positive observations.

As the paper said, the low ratings will get a low confidence weight, so if I
understand correctly, these dominant one-timers will be more *unlikely* to
be recommended comparing to other items whose nbPurchase is bigger.

In fact, lambda is also considered as a potential problem, as in our case,
the lambda is set to 300, which is confirmed by the test set. Here is test
result :

*lambda = 65
EPR_in  = 0.06518592593142056
EPR_out = 0.14789338884259276

lambda = 100
EPR_in  = 0.06619274171311466
EPR_out = 0.13494609978226865

lambda = 300
EPR_in  = 0.08814703345418627
EPR_out = 0.09522125434156471*

where EPR_in is given by training set and EPR_out is given by test set. It
seems 300 is the right lambda, since less overfitting.

Some other parameters are showed in the following code :

*val model = new ALS()
      .setImplicitPrefs(implicitPrefs = true)

we set Alpha to 1, since the max nbPurchase is 1396. Not sure if Alpha is
already too big.


View this message in context:
Sent from the Apache Spark User List mailing list archive at

View raw message