spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From txw <...@outlook.com>
Subject Re: Is spark suitable for large scale pagerank, such as 200 millionnodes, 2 billion edges?
Date Sun, 18 Jan 2015 06:18:02 GMT
I’ve read these pages. In the paper "GraphX: Graph Processing in a Distributed Dataflow Framework
“, the authors claim that it only takes 400 seconds for uk-2007-05 dataset, which is similar
size as my dateset. Is the current Graphx the same version as the Graphx in that paper? And
how many partitions does the experiment have for uk-2007-05 dataset. I tried 16, 192partitions,
and both are sucked.




原始邮件
发件人:Ted Yuyuzhihong@gmail.com
收件人:txwtxw@outlook.com
抄送:useruser@spark.apache.org
发送时间:2015年1月16日(周五) 02:23
主题:Re: Is spark suitable for large scale pagerank, such as 200 millionnodes, 2 billion
edges?


Have you seenhttp://search-hadoop.com/m/JW1q5pE3P12 ?


Please also take a look at the end-to-end performance graph on http://spark.apache.org/graphx/


Cheers


On Thu, Jan 15, 2015 at 9:29 AM, txw txw@outlook.com wrote:

Hi,


I am run PageRank on a large dataset, which include 200 million nodes and 2 billion edges?
Isspark suitable for large scale pagerank? How many cores and MEM do I need and how long will
it take?


Thanks


Xuewei Tang
Mime
View raw message