spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ankur Dave <ankurd...@gmail.com>
Subject Re: GraphX vertices and connected edges
Date Fri, 02 May 2014 21:15:24 GMT
Do you mean you want to obtain a list of adjacent edges for every vertex? A
mapReduceTriplets followed by a join is the right way to do this. The join
will be cheap because the original and derived vertices will share indices.

There's a built-in function to do this for neighboring vertex properties
called GraphOps.collectNeighbors<http://spark.apache.org/docs/latest/api/graphx/index.html#org.apache.spark.graphx.GraphOps@collectNeighbors(EdgeDirection):VertexRDD[Array[(VertexId,VD)]]>.
In the latest version of GraphX there's also
GraphOps.collectEdges<https://github.com/apache/spark/blob/master/graphx/src/main/scala/org/apache/spark/graphx/GraphOps.scala#L140>,
but that doesn't do the join.

As the docs for these functions state, this will be memory-intensive for
high-degree vertices, so it would be better to break up the computation so
you don't need to collect all neighbors or edges if possible.

Ankur <http://www.ankurdave.com/>


On Fri, May 2, 2014 at 11:34 AM, Kyle Ellrott <kellrott@soe.ucsc.edu> wrote:

> What is the most efficient way to an RDD of GraphX vertices and their
> connected edges?
>

Mime
View raw message