spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Deep Pradhan <pradhandeep1...@gmail.com>
Subject Re: Filter using the Vertex Ids
Date Wed, 03 Dec 2014 10:13:49 GMT
This is just an example but if my graph is big, there will be so many
tuples to handle. I cannot manually do
val a: RDD[(Int, Double)] = sc.parallelize(List(
      (1, 1.0),
      (2, 1.0),
      (3, 2.0),
      (4, 2.0),
      (5, 0.0)))
for all the vertices in the graph.
What should I do in that case?
We cannot do *sc.parallelize(List(VertexRDD)), *can we?

On Wed, Dec 3, 2014 at 3:32 PM, Ankur Dave <ankurdave@gmail.com> wrote:

> At 2014-12-02 22:01:20 -0800, Deep Pradhan <pradhandeep1991@gmail.com>
> wrote:
> > I have a graph which returns the following on doing graph.vertices
> > (1, 1.0)
> > (2, 1.0)
> > (3, 2.0)
> > (4, 2.0)
> > (5, 0.0)
> >
> > I want to group all the vertices with the same attribute together, like
> into
> > one RDD or something. I want all the vertices with same attribute to be
> > together.
>
> You can do this by flipping the tuples so the values become the keys, then
> using one of the by-key functions in PairRDDFunctions:
>
>     val a: RDD[(Int, Double)] = sc.parallelize(List(
>       (1, 1.0),
>       (2, 1.0),
>       (3, 2.0),
>       (4, 2.0),
>       (5, 0.0)))
>
>     val b: RDD[(Double, Int)] = a.map(kv => (kv._2, kv._1))
>
>     val c: RDD[(Double, Iterable[Int])] = b.groupByKey(numPartitions = 5)
>
>     c.collect.foreach(println)
>     // (0.0,CompactBuffer(5))
>     // (1.0,CompactBuffer(1, 2))
>     // (2.0,CompactBuffer(3, 4))
>
> Ankur
>

Mime
View raw message