spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From redocpot <julien19890...@gmail.com>
Subject Re: implicit ALS dataSet
Date Mon, 23 Jun 2014 10:03:36 GMT
Hi, 

The real-world dataset is a bit more large, so I tested on the MovieLens
data set, and find the same results:


  	alpha
  	lambda 
  	rank
  	top1
  	top5
  	EPR_in
  	EPR_out


  	40
  	0.001 
  	50
  	297
  	559
  	0.05855
  	0.17299



  	40
  	0.01 
  	50
  	295
  	559
  	0.05854
  	0.17298


  	40
  	0.1 
  	50
  	296
  	560
  	0.05846
  	0.17287


  	40
  	1 
  	50
  	309
  	564
  	0.05819
  	0.17227


  	40
  	25 
  	50
  	287
  	537
  	0.05699
  	0.14855


  	40
  	50 
  	50
  	267
  	496
  	0.05795
  	0.13389


  	40
  	100 
  	50
  	247
  	444
  	0.06504
  	0.11920


  	40
  	200 
  	50
  	145
  	306
  	0.09558
  	0.11388


  	40
  	300 
  	50
  	77
  	178
  	0.11340
  	0.12264



To be clear, there are 1650 items in this movielens data set. Top 1 and Top
5 in the table means the nb of diff items on top1 and top5 according to the
preference list for each user after ALS do the work. Top1, top5, EPR_in are
based on training set. Only EPR_out is on test set. In the top1 and top5,
all items are taken into account, no matter whether it is purchased or not.

The table shows that small lambda( < 1) always leads to over fitting, while
big lambda like 300 removes over fitting but the nb of diff items on the top
1 and top 5 of the preference list is very small (not personalized).





--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/implicit-ALS-dataSet-tp7067p8115.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Mime
View raw message