spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Ash <and...@andrewash.com>
Subject Re: Adjacency List representation in Spark
Date Wed, 17 Sep 2014 17:14:15 GMT
Hi Harsha,

You could look through the GraphX source to see the approach taken there
for ideas in your own.  I'd recommend starting at
https://github.com/apache/spark/blob/master/graphx/src/main/scala/org/apache/spark/graphx/Graph.scala#L385
to see the storage technique.

Why do you want to avoid using GraphX?

Good luck!
Andrew

On Wed, Sep 17, 2014 at 6:43 AM, Harsha HN <99harsha.h.n99@gmail.com> wrote:

> Hello
>
> We are building an adjacency list to represent a graph. Vertexes, Edges
> and Weights for the same has been extracted from hdfs files by a Spark job.
> Further we expect size of the adjacency list(Hash Map) could grow over
> 20Gigs.
> How can we represent this in RDD, so that it will distributed in nature?
>
> Basically we are trying to fit HashMap(Adjacency List) into Spark RDD. Is
> there any other way other than GraphX?
>
> Thanks and Regards,
> Harsha
>

Mime
View raw message