spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Malak <michaelma...@yahoo.com.INVALID>
Subject Wrong initial bias in GraphX SVDPlusPlus?
Date Fri, 03 Apr 2015 15:41:56 GMT
I believe that in the initialization portion of GraphX SVDPlusPluS, the initialization of biases
is incorrect. Specifically, in line 
https://github.com/apache/spark/blob/master/graphx/src/main/scala/org/apache/spark/graphx/lib/SVDPlusPlus.scala#L96

instead of 
(vd._1, vd._2, msg.get._2 / msg.get._1, 1.0 / scala.math.sqrt(msg.get._1)) 
it should be 
(vd._1, vd._2, msg.get._2 / msg.get._1 - u, 1.0 / scala.math.sqrt(msg.get._1)) 

That is, the biases bu and bi (both represented as the third component of the Tuple4[] above,
depending on whether the vertex is a user or an item), described in equation (1) of the Koren
paper, are supposed to be small offsets to the mean (represented by the variable u, signifying
the Greek letter mu) to account for peculiarities of individual users and items. 

Initializing these biases to wrong values should theoretically not matter given enough iterations
of the algorithm, but some quick empirical testing shows it has trouble converging at all,
even after many orders of magnitude additional iterations. 

This perhaps could be the source of previously reported trouble with SVDPlusPlus. 
http://apache-spark-user-list.1001560.n3.nabble.com/GraphX-SVDPlusPlus-problem-td12885.html


If after a day, no one tells me I'm crazy here, I'll go ahead and create a Jira ticket. 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Mime
View raw message