spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From tom85 <>
Subject Re: Pagerank implementation
Date Tue, 18 Nov 2014 16:40:53 GMT
I see, thanks.

So to implement pagerank with damping factor divided by number of vertices: 
Is it sufficient to modify initialMessage to 
*val initialMessage = (resetProb / graph.vertices.count())/ (1.0 -
instead of 
*val initialMessage = resetProb / (1.0 - resetProb)*
and yield correct results?

Another question:
I load a graph and specify the number of partitions used (should correlate
to some multiply of total cores used, i.e. number of machines * number of
cores/machine?). This can be seen in the SparkUI after loading the graph.
However, when performing pagerank, the amount of RDDs increase significantly
over the runtime of the algorithm (with total size even more than the size
of input graph).
Is this due to the read-only nature of RDDs? In each iteration, are new RDDs
created storing intermediate pagerank results?

View this message in context:
Sent from the Apache Spark User List mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message