spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeffrey Picard <>
Subject Re: GraphX Connected Components
Date Thu, 31 Jul 2014 03:32:54 GMT

On Jul 30, 2014, at 4:39 PM, Ankur Dave <> wrote:

> Jeffrey Picard <> writes:
>> I tried unpersisting the edges and vertices of the graph by hand, then
>> persisting the graph with persist(StorageLevel.MEMORY_AND_DISK). I still see
>> the same behavior in connected components however, and the same thing you
>> described in the storage page.
> Unfortunately it's not possible to change the graph's storage level by hand without modifying
GraphX itself, because internally GraphX will create new RDDs, persist them using MEMORY_ONLY,
and immediately materialize them, all before you get a chance to change the storage level.
You can see this happening in the storage page: one graph (a VertexRDD and an EdgeRDD) has
the desired storage level, but new ones are still set to MEMORY_ONLY.
>> It seems that the version of graphx I’m using doesn't have the option for
>> setting the storage level in the GraphLoader.edgeListFile method.
>> [...]
>> Would that (newer?) version of GraphX with the storage level settable in the
>> edgeListFile possibly solve this, or could there still be something else going
>> on?
> Yes, it looks like custom storage levels would solve the problem. That was added in apache/spark#946
[1], which will be released as part of Spark 1.1.0. Until then, is it possible for you to
rebuild Spark from the master branch?
> Ankur
> [1]

That worked! The entire thing ran in about an hour and a half, thanks!

Is there by chance an easy way to build spark apps using the master branch build of spark?
I’ve been having to use the spark-shell.

View raw message