spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ankur Dave <ankurd...@gmail.com>
Subject Re: Filter using the Vertex Ids
Date Wed, 03 Dec 2014 10:02:01 GMT
At 2014-12-02 22:01:20 -0800, Deep Pradhan <pradhandeep1991@gmail.com> wrote:
> I have a graph which returns the following on doing graph.vertices
> (1, 1.0)
> (2, 1.0)
> (3, 2.0)
> (4, 2.0)
> (5, 0.0)
>
> I want to group all the vertices with the same attribute together, like into
> one RDD or something. I want all the vertices with same attribute to be
> together.

You can do this by flipping the tuples so the values become the keys, then using one of the
by-key functions in PairRDDFunctions:

    val a: RDD[(Int, Double)] = sc.parallelize(List(
      (1, 1.0),
      (2, 1.0),
      (3, 2.0),
      (4, 2.0),
      (5, 0.0)))
    
    val b: RDD[(Double, Int)] = a.map(kv => (kv._2, kv._1))
    
    val c: RDD[(Double, Iterable[Int])] = b.groupByKey(numPartitions = 5)
    
    c.collect.foreach(println)
    // (0.0,CompactBuffer(5))
    // (1.0,CompactBuffer(1, 2))
    // (2.0,CompactBuffer(3, 4))

Ankur

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message