spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ankurdave <ankurd...@gmail.com>
Subject Re: Caching in graphX
Date Fri, 09 May 2014 07:04:13 GMT
Unfortunately it's very difficult to get uncaching right with GraphX due to
the complicated internal dependency structure that it creates. It's
necessary to know exactly what operations you're doing on the graph in order
to unpersist correctly (i.e., in a way that avoids recomputation).

I have a pull request (https://github.com/apache/spark/pull/497) that may
make this a bit easier, but your best option is to use the Pregel API for
iterative algorithms if possible.

If that's not possible, leaving things cached has actually not been very
costly in my experience, at least as long as VD and ED are primitive types
to reduce the load on the garbage collector.

Ankur



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Caching-in-graphX-tp5482p5514.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Mime
View raw message