spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ankur Dave <ankurd...@gmail.com>
Subject Re: [GraphX] The best way to construct a graph
Date Fri, 01 Aug 2014 07:33:29 GMT
At 2014-08-01 11:23:49 +0800, Bin <wubin_phight@126.com> wrote:
> I am wondering what is the best way to construct a graph?
>
> Say I have some attributes for each user, and specific weight for each user pair. The
way I am currently doing is first read user information and edge triple into two arrays, then
use sc.parallelize to create vertexRDD and edgeRDD, respectively. Then create the graph using
Graph(vertices, edges).
>
> I wonder whether there is a better way to do this?

That's a perfectly fine way to construct a graph. Are you encountering a problem with it?

The only suggestion I would make is to load the data using sc.textFile rather than reading
into an array and calling sc.parallelize. This will avoid loading it all into the driver's
memory.

GraphLoader does have the slight advantage that it avoids allocating a pair per vertex, but
this is unlikely to be a big cost, so it's fine to use Graph(vertices, edges) if GraphLoader
isn't suitable.

Ankur

Mime
View raw message