spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeffrey Picard <jp3...@columbia.edu>
Subject Re: GraphX Connected Components
Date Thu, 31 Jul 2014 03:32:54 GMT

On Jul 30, 2014, at 4:39 PM, Ankur Dave <ankurdave@gmail.com> wrote:

> Jeffrey Picard <jp3436@columbia.edu> writes:
>> I tried unpersisting the edges and vertices of the graph by hand, then
>> persisting the graph with persist(StorageLevel.MEMORY_AND_DISK). I still see
>> the same behavior in connected components however, and the same thing you
>> described in the storage page.
> 
> Unfortunately it's not possible to change the graph's storage level by hand without modifying
GraphX itself, because internally GraphX will create new RDDs, persist them using MEMORY_ONLY,
and immediately materialize them, all before you get a chance to change the storage level.
You can see this happening in the storage page: one graph (a VertexRDD and an EdgeRDD) has
the desired storage level, but new ones are still set to MEMORY_ONLY.
> 
>> It seems that the version of graphx I’m using doesn't have the option for
>> setting the storage level in the GraphLoader.edgeListFile method.
>> https://spark.apache.org/docs/1.0.1/api/scala/index.html#org.apache.spark.graphx.GraphLoader$
>> [...]
>> Would that (newer?) version of GraphX with the storage level settable in the
>> edgeListFile possibly solve this, or could there still be something else going
>> on?
> 
> Yes, it looks like custom storage levels would solve the problem. That was added in apache/spark#946
[1], which will be released as part of Spark 1.1.0. Until then, is it possible for you to
rebuild Spark from the master branch?
> 
> Ankur
> 
> [1] https://github.com/apache/spark/pull/946

That worked! The entire thing ran in about an hour and a half, thanks!

Is there by chance an easy way to build spark apps using the master branch build of spark?
I’ve been having to use the spark-shell.



Mime
View raw message