spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ankur Dave <ankurd...@gmail.com>
Subject Re: Pagerank implementation
Date Tue, 18 Nov 2014 09:20:39 GMT
At 2014-11-15 18:01:22 -0700, tom85 <tom.manhardt@gmail.com> wrote:
> This line: val newPR = oldPR + (1.0 - resetProb) * msgSum
> makes no sense to me. Should it not be:
> val newPR = resetProb/graph.vertices.count() + (1.0 - resetProb) * msgSum 
> ?

This is an unusual version of PageRank where the messages being passed around are deltas rather
than full ranks. This occurs at line 156 in vertexProgram, which returns (newPR, newPR - oldPR).
The second element of the tuple is the delta, which is subsequently used in sendMessage.

The benefit of this is that sendMessage can avoid sending when the delta drops below the convergence
threshold `tol`, indicating that the source vertex has converged. But it means that to update
the rank of each vertex, we have to add the incoming delta to its existing rank. That's why
the oldPR term appears in the line you're looking at.

Ankur

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message