spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Deep Pradhan <pradhandeep1...@gmail.com>
Subject Re: Filter using the Vertex Ids
Date Wed, 03 Dec 2014 10:17:14 GMT
And one more thing, the given tupes
(1, 1.0)
(2, 1.0)
(3, 2.0)
(4, 2.0)
(5, 0.0)

are a part of RDD and they are not just tuples.
graph.vertices return me the above tuples which is a part of VertexRDD.


On Wed, Dec 3, 2014 at 3:43 PM, Deep Pradhan <pradhandeep1991@gmail.com>
wrote:

> This is just an example but if my graph is big, there will be so many
> tuples to handle. I cannot manually do
> val a: RDD[(Int, Double)] = sc.parallelize(List(
>       (1, 1.0),
>       (2, 1.0),
>       (3, 2.0),
>       (4, 2.0),
>       (5, 0.0)))
> for all the vertices in the graph.
> What should I do in that case?
> We cannot do *sc.parallelize(List(VertexRDD)), *can we?
>
> On Wed, Dec 3, 2014 at 3:32 PM, Ankur Dave <ankurdave@gmail.com> wrote:
>
>> At 2014-12-02 22:01:20 -0800, Deep Pradhan <pradhandeep1991@gmail.com>
>> wrote:
>> > I have a graph which returns the following on doing graph.vertices
>> > (1, 1.0)
>> > (2, 1.0)
>> > (3, 2.0)
>> > (4, 2.0)
>> > (5, 0.0)
>> >
>> > I want to group all the vertices with the same attribute together, like
>> into
>> > one RDD or something. I want all the vertices with same attribute to be
>> > together.
>>
>> You can do this by flipping the tuples so the values become the keys,
>> then using one of the by-key functions in PairRDDFunctions:
>>
>>     val a: RDD[(Int, Double)] = sc.parallelize(List(
>>       (1, 1.0),
>>       (2, 1.0),
>>       (3, 2.0),
>>       (4, 2.0),
>>       (5, 0.0)))
>>
>>     val b: RDD[(Double, Int)] = a.map(kv => (kv._2, kv._1))
>>
>>     val c: RDD[(Double, Iterable[Int])] = b.groupByKey(numPartitions = 5)
>>
>>     c.collect.foreach(println)
>>     // (0.0,CompactBuffer(5))
>>     // (1.0,CompactBuffer(1, 2))
>>     // (2.0,CompactBuffer(3, 4))
>>
>> Ankur
>>
>
>

Mime
View raw message