spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: spark graphx storage RDD memory leak
Date Sun, 10 Apr 2016 17:15:23 GMT
I see the following code toward the end of the method:

      // Unpersist the RDDs hidden by newly-materialized RDDs
      oldMessages.unpersist(blocking = false)
      prevG.unpersistVertices(blocking = false)
      prevG.edges.unpersist(blocking = false)

Wouldn't the above achieve same effect ?

On Sun, Apr 10, 2016 at 9:08 AM, zhang juntao <juntao.zhang.cn@gmail.com>
wrote:

> hi experts,
>
> I’m reporting a problem about spark graphx, I use zeppelin submit spark
> jobs,
> note that scala environment shares the same SparkContext, SQLContext
> instance,
> and I call  Connected components algorithm to do some Business,
> found that every time when the job finished, some graph storage RDDs
> weren’t bean released,
> after several times there would be a lot of  storage RDDs existing even
> through all the jobs have finished .
>
>
> So I check the code of connectedComponents  and find that may be a problem
> in *Pregel.scala* .
> when param graph has been cached, there isn’t any way to unpersist,
> so I add red font code to solve the problem
>
>
>
>
>
>
>
>
>
>
> *def apply[VD: ClassTag, ED: ClassTag, A: ClassTag]   (graph: Graph[VD, ED],    initialMsg:
A,    maxIterations: Int = Int.MaxValue,    activeDirection: EdgeDirection = EdgeDirection.Either)
  (vprog: (VertexId, VD, A) => VD,    sendMsg: EdgeTriplet[VD, ED] => Iterator[(VertexId,
A)],    mergeMsg: (A, A) => A)  : Graph[VD, ED] ={*
>
>
> *  ......*
>
>
>
> *  var g = graph.mapVertices((vid, vdata) => vprog(vid, vdata, initialMsg)).cache()
 graph.unpersistVertices(blocking = false)  graph.edges.unpersist(blocking = false)*
>
> * ......*
>
>
> *} // end of apply*
>
>
> I'm not sure if this is a bug, and thank you for your time, juntao
>
>
>

Mime
View raw message