spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robin East <>
Subject Re: Get all vertexes with outDegree equals to 0 with GraphX
Date Fri, 26 Feb 2016 13:16:04 GMT
Whilst I can think of other ways to do it I don’t think they would be conceptually or syntactically
any simpler. GraphX doesn’t have the concept of built-in vertex properties which would make
this simpler - a vertex in GraphX is a Vertex ID (Long) and a bunch of custom attributes that
you assign. This means you have to find a way of ‘pushing’ the vertex degree into the
graph so you can do comparisons (cf a join in relational databases) or as you have done create
a list and filter against that (cf filtering against a sub-query in relational database).

One thing I would point out is that you probably want to avoid finalVerexes.collect() for
a large-scale system - this will pull all the vertices into the driver and then push them
out to the executors again as part of the filter operation. A better strategy for large graphs
would be:

1. build a graph based on the existing graph where the vertex attribute is the vertex degree
- the GraphX documentation shows how to do this
2. filter this “degrees” graph to just give you 0 degree vertices
3 use graph.mask passing in the 0-degree graph to get the original graph with just 0 degree

Just one variation on several possibilities, the key point is that everything is just a graph
transformation until you call an action on the resulting graph
Robin East
Spark GraphX in Action Michael Malak and Robin East
Manning Publications Co. <>

> On 26 Feb 2016, at 11:59, Guillermo Ortiz <> wrote:
> I'm new with graphX. I need to get the vertex without out edges..
> I guess that it's pretty easy but I did it pretty complicated.. and inefficienct 
> val vertices: RDD[(VertexId, (List[String], List[String]))] =
>   sc.parallelize(Array((1L, (List("a"), List[String]())),
>     (2L, (List("b"), List[String]())),
>     (3L, (List("c"), List[String]())),
>     (4L, (List("d"), List[String]())),
>     (5L, (List("e"), List[String]())),
>     (6L, (List("f"), List[String]()))))
> // Create an RDD for edges
> val relationships: RDD[Edge[Boolean]] =
>   sc.parallelize(Array(Edge(1L, 2L, true), Edge(2L, 3L, true), Edge(3L, 4L, true), Edge(5L,
2L, true)))
> val out = => vertex._1)
> val finalVertexes = minGraph.vertices.keys.subtract(out)
> //It must be something better than this way..
> val nodes = finalVertexes.collect()
> val result = minGraph.vertices.filter(v => nodes.contains(v._1))
> What's the good way to do this operation? It seems that it should be pretty easy.

View raw message