spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From MA2 <>
Subject traverse a graph based on edge properties whilst counting matching vertex attributes
Date Tue, 09 Jun 2015 15:29:35 GMT
Hi All, 

I was hoping somebody might be able to help out, 

I currently have a network built using graphx which looks like the following
(only with a much larger number of vertices and edges) 

ID, Attribute1, Attribute2 
1001 2 0 
1002 1 0 
1003 2 1 
1004 3 2 
1006 4 0 
1007 5 1 

Source, Destination, Attribute 
1001 1002 7 
1002 1003 7 
1003 1004 7 
1004 1005 3 
1002 1006 5 
1006 1007 5 

For each vertex I need to send a message down a chain to each connected
component based on the edge attribute and count how many matches there are
of the vertex attribute to another vertex attribute along the chain. 

So for example: 
For vertex 1004 the connecting edge attribute is 7, so I want to identify
each component which is connected to 1004 by edge attribute 7, in this case
it would be 1001->1002->1003->1004, then pattern match the second vertex
attribute from 1004 (in this case 2) to any matching first vertex attributes
along the chain (in this case it would match with 1003 and 1001, giving me a
total count of 2). 

The two bits of code I have at the moment are firstly to subgraph by vertex
ID (although I have to specify the edge property) 

      val edgeProperty = "7" 
      val fGraph = graph.subgraph(epred = e => e.attr == edgeProperty) 

and currently I have code which counts the nearest neighbour, but does not
count down the chain of connected nodes. 

    val counts = { 
      fGraph.collectNeighbors(EdgeDirection.In).join(graph.vertices).map { 
        case (id, t) => { 
          val neighbors: Array[(VertexId, Array[String])] = t._1 
          val nodeAttr = t._2 
 => x(0) == nodeAttr(2)).size 

The problem I’m having is to make this process automatic and to pattern
match down all connected components, so for each vertex: 

1. Subgraph by all edge properties which connect to it 
2. Count all matching vertex properties along each of these subgraphs 
3. Produce a count at the end for each vertex 

I’m still pretty new to spark and scala so I may be looking to do this in a
very inefficient way, any suggestions of how best to achieve this task would
be most welcome, for example would this be possible using Pregel?

View this message in context:
Sent from the Apache Spark User List mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message