spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ankur Dave <ankurd...@gmail.com>
Subject Re: [GraphX] how to set memory configurations to avoid OutOfMemoryError "GC overhead limit exceeded"
Date Tue, 09 Sep 2014 06:28:18 GMT
At 2014-09-05 12:13:18 +0200, Yifan LI <iamyifanli@gmail.com> wrote:
> But how to assign the storage level to a new vertices RDD that mapped from
> an existing vertices RDD,
> e.g.
> *val newVertexRDD =
> graph.collectNeighborIds(EdgeDirection.Out).map{case(id:VertexId,
> a:Array[VertexId]) => (id, initialHashMap(a))}*
>
> the new one will be combined with that existing edges RDD(MEMORY_AND_DISK)
> to construct a new graph.
> e.g.
> val newGraph = Graph(newVertexRDD, graph.edges)

Sorry for the late reply. If you are constructing a graph from the derived VertexRDD, you
can pass a desired storage level to the Graph constructor:

    val newVertexRDD = graph.collectNeighborIds(EdgeDirection.Out).map {
      case (id: VertexId, a: Array[VertexId]) => (id, initialHashMap(a))
    }
    val newGraph = Graph(
      newVertexRDD,
      graph.edges,
      edgeStorageLevel = StorageLevel.MEMORY_AND_DISK,
      vertexStorageLevel = StorageLevel.MEMORY_AND_DISK)

For others reading, the reason why GraphX needs to be told the desired storage level is that
it internally constructs temporary vertex or edge RDDs and uses them more than once, so it
has to cache them to avoid recomputation.

> BTW, the return of newVertexRDD.getStorageLevel is StorageLevel(true, true,
> false, true, 1), what does it mean?

See the StorageLevel object [1]. This particular storage level corresponds to StorageLevel.MEMORY_AND_DISK.

Ankur

[1] https://github.com/apache/spark/blob/092e2f152fb674e7200cc8a2cb99a8fe0a9b2b33/core/src/main/scala/org/apache/spark/storage/StorageLevel.scala#L147

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message