And one more thing, the given tupes
(1, 1.0)
(2, 1.0)
(3, 2.0)
(4, 2.0)<= /span>
(5, 0.0)
=
are a part of RDD and they are not just tuples.
graph.vertices return me the above tupl= es which is a part of VertexRDD.

On Wed, Dec 3, 2014 at= 3:43 PM, Deep Pradhan w= rote:
This is just an ex= ample but if my graph is big, there will be so many tuples to handle. I can= not manually do=C2=A0
val a: RDD[(Int, Double)] =3D sc.parallelize(Li= st(
=C2=A0 =C2=A0 =C2=A0 (= 1, 1.0),
=C2=A0 =C2=A0 =C2= =A0 (2, 1.0),
=C2=A0 =C2= =A0 =C2=A0 (3, 2.0),
=C2= =A0 =C2=A0 =C2=A0 (4, 2.0),
=C2=A0 =C2=A0 =C2=A0 (5, 0.0)))
for all the vertices in th= e graph.
What should I do in that case?
We cannot do sc.parallelize(List(VertexRDD)), can w= e?

On Wed, Dec 3, 2014 at 3:32 PM,= Ankur Dave wrote:
> I have a graph which returns the following on doing graph.vertices
> (1, 1.0)
> (2, 1.0)
> (3, 2.0)
> (4, 2.0)
> (5, 0.0)
>
> I want to group all the vertices with the same attribute = together, like into
> one RDD or something. I want all the vertices with same attribute to b= e
> together.

You can do this by flipping the tuples so the values become the keys= , then using one of the by-key functions in PairRDDFunctions:

=C2=A0 =C2=A0 val a: RDD[(Int, Double)] =3D sc.parallelize(List(
=C2=A0 =C2=A0 =C2=A0 (1, 1.0),
=C2=A0 =C2=A0 =C2=A0 (2, 1.0),
=C2=A0 =C2=A0 =C2=A0 (3, 2.0),
=C2=A0 =C2=A0 =C2=A0 (4, 2.0),
=C2=A0 =C2=A0 =C2=A0 (5, 0.0)))

=C2=A0 =C2=A0 val b: RDD[(Double, Int)] =3D a.map(kv =3D> (kv._2, kv._1)= )

=C2=A0 =C2=A0 val c: RDD[(Double, Iterable[Int])] =3D b.groupByKey(numParti= tions =3D 5)

=C2=A0 =C2=A0 c.collect.foreach(println)
=C2=A0 =C2=A0 // (0.0,CompactBuffer(5))
=C2=A0 =C2=A0 // (1.0,CompactBuffer(1, 2))
=C2=A0 =C2=A0 // (2.0,CompactBuffer(3, 4))

Ankur

--001a11341e5cc2879c05094d2564--