spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mohammed Guller <>
Subject RE: Get all vertexes with outDegree equals to 0 with GraphX
Date Sat, 27 Feb 2016 17:27:10 GMT
Perhaps, the documentation of the filter method would help. Here is the method signature (copied
from the API doc)

def  filter[VD2, ED2](preprocess: (Graph[VD, ED]) => Graph[VD2, ED2], epred: (EdgeTriplet[VD2,
ED2]) => Boolean = (x: EdgeTriplet[VD2, ED2]) => true, vpred: (VertexId, VD2) =>
Boolean = (v: VertexId, d: VD2) => true)
This method returns a subgraph of the original graph. The  data in the original graph remains
unchanged. Brief description of the arguments:

VD2:    vertex type the vpred operates on
ED2:    edge type the epred operates on
preprocess:   a function to compute new vertex and edge data before filtering
epred:   edge predicate to filter on after preprocess
vpred:   vertex predicate to filter on after prerocess

In the solution below, the first function literal is the preprocess argument. The vpred argument
is passed as named argument since we are using the default value for epred.


Author: Big Data Analytics with Spark<>

From: Guillermo Ortiz []
Sent: Saturday, February 27, 2016 6:17 AM
To: Mohammed Guller
Cc: Robin East; user
Subject: Re: Get all vertexes with outDegree equals to 0 with GraphX

Thank you, I have to think what the code does,, because I am a little noob in scala and it's
hard to understand it to me.

2016-02-27 3:53 GMT+01:00 Mohammed Guller <<>>:
Here is another solution (minGraph is the graph from your code. I assume that is your original

val graphWithNoOutEdges = minGraph.filter(
  graph => graph.outerJoinVertices(graph.outDegrees) {(vId, vData, outDegreesOpt) =>
  vpred = (vId: VertexId, vOutDegrees: Int) => vOutDegrees == 0

val verticesWithNoOutEdges = graphWithNoOutEdges.vertices

Author: Big Data Analytics with Spark<>

From: Guillermo Ortiz [<>]
Sent: Friday, February 26, 2016 5:46 AM
To: Robin East
Cc: user
Subject: Re: Get all vertexes with outDegree equals to 0 with GraphX

Yes, I am not really happy with that "collect".
I was taking a look to use subgraph method and others options and didn't figure out anything
easy or direct..

I'm going to try your idea.

2016-02-26 14:16 GMT+01:00 Robin East <<>>:
Whilst I can think of other ways to do it I don’t think they would be conceptually or syntactically
any simpler. GraphX doesn’t have the concept of built-in vertex properties which would make
this simpler - a vertex in GraphX is a Vertex ID (Long) and a bunch of custom attributes that
you assign. This means you have to find a way of ‘pushing’ the vertex degree into the
graph so you can do comparisons (cf a join in relational databases) or as you have done create
a list and filter against that (cf filtering against a sub-query in relational database).

One thing I would point out is that you probably want to avoid finalVerexes.collect() for
a large-scale system - this will pull all the vertices into the driver and then push them
out to the executors again as part of the filter operation. A better strategy for large graphs
would be:

1. build a graph based on the existing graph where the vertex attribute is the vertex degree
- the GraphX documentation shows how to do this
2. filter this “degrees” graph to just give you 0 degree vertices
3 use graph.mask passing in the 0-degree graph to get the original graph with just 0 degree

Just one variation on several possibilities, the key point is that everything is just a graph
transformation until you call an action on the resulting graph
Robin East
Spark GraphX in Action Michael Malak and Robin East
Manning Publications Co.

On 26 Feb 2016, at 11:59, Guillermo Ortiz <<>>

I'm new with graphX. I need to get the vertex without out edges..
I guess that it's pretty easy but I did it pretty complicated.. and inefficienct

val vertices: RDD[(VertexId, (List[String], List[String]))] =
  sc.parallelize(Array((1L, (List("a"), List[String]())),
    (2L, (List("b"), List[String]())),
    (3L, (List("c"), List[String]())),
    (4L, (List("d"), List[String]())),
    (5L, (List("e"), List[String]())),
    (6L, (List("f"), List[String]()))))

// Create an RDD for edges
val relationships: RDD[Edge[Boolean]] =
  sc.parallelize(Array(Edge(1L, 2L, true), Edge(2L, 3L, true), Edge(3L, 4L, true), Edge(5L,
2L, true)))

val out = => vertex._1)

val finalVertexes = minGraph.vertices.keys.subtract(out)

//It must be something better than this way..
val nodes = finalVertexes.collect()
val result = minGraph.vertices.filter(v => nodes.contains(v._1))

What's the good way to do this operation? It seems that it should be pretty easy.

View raw message