spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From redocpot <>
Subject Re: implicit ALS dataSet
Date Thu, 19 Jun 2014 14:03:12 GMT

Recently, I have launched a implicit ALS test on a real-world data set.

Initially, we have 2 data set, one is the purchase record during 3 years
past (training set), and the other is the one during 6 months just after the
3 years (test set)

It's a database with 1060080 user and 23880 items.

According the paper based on which MLlib als is implemented, we use expected
percentile rank(EPR) to evaluation the recommendation performance. It shows
a EPR about 8% - 9% which is considered as a good result in the paper.

We did some sanity check. For example, each user has his own item list which
is sorted by preference, then we just pick the top 10 items for each user.
As a result, we found that there were only 169 different items among the
(1060080 x 10) items picked, most of them are repeated. That means, given 2
users, the items recommended might be the same. Nothing is personalized.

It seems that the system is focusing on the best-seller, sth like that. What
we want is to recommended as many different items as possible. That makes
the reco sys more reasonable.

I am not sure if it is a common case for ALS ?



View this message in context:
Sent from the Apache Spark User List mailing list archive at

View raw message