And one more thing, the given tupes (1, 1.0) (2, 1.0) (3, 2.0) (4, 2.0) (5, 0.0) are a part of RDD and they are not just tuples. graph.vertices return me the above tuples which is a part of VertexRDD. On Wed, Dec 3, 2014 at 3:43 PM, Deep Pradhan wrote: > This is just an example but if my graph is big, there will be so many > tuples to handle. I cannot manually do > val a: RDD[(Int, Double)] = sc.parallelize(List( > (1, 1.0), > (2, 1.0), > (3, 2.0), > (4, 2.0), > (5, 0.0))) > for all the vertices in the graph. > What should I do in that case? > We cannot do *sc.parallelize(List(VertexRDD)), *can we? > > On Wed, Dec 3, 2014 at 3:32 PM, Ankur Dave wrote: > >> At 2014-12-02 22:01:20 -0800, Deep Pradhan >> wrote: >> > I have a graph which returns the following on doing graph.vertices >> > (1, 1.0) >> > (2, 1.0) >> > (3, 2.0) >> > (4, 2.0) >> > (5, 0.0) >> > >> > I want to group all the vertices with the same attribute together, like >> into >> > one RDD or something. I want all the vertices with same attribute to be >> > together. >> >> You can do this by flipping the tuples so the values become the keys, >> then using one of the by-key functions in PairRDDFunctions: >> >> val a: RDD[(Int, Double)] = sc.parallelize(List( >> (1, 1.0), >> (2, 1.0), >> (3, 2.0), >> (4, 2.0), >> (5, 0.0))) >> >> val b: RDD[(Double, Int)] = a.map(kv => (kv._2, kv._1)) >> >> val c: RDD[(Double, Iterable[Int])] = b.groupByKey(numPartitions = 5) >> >> c.collect.foreach(println) >> // (0.0,CompactBuffer(5)) >> // (1.0,CompactBuffer(1, 2)) >> // (2.0,CompactBuffer(3, 4)) >> >> Ankur >> > >