spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From arpp <>
Subject Custom edge partitioning in graphX
Date Sat, 28 Mar 2015 12:54:12 GMT
Hi all,
I am working with spark 1.0.0. mainly for the usage of GraphX and wished to
apply some custom partitioning strategies on the edge list of the graph.
I have generated an edge list file which has the partition number after the
source and destination id in each line. Initially I am loading the
unannotated graph using GraphLoader and then loading the annotated file and

val unpartitionedGraph = GraphLoader.edgeListFile(sc, fname,
minEdgePartitions = numEPart).cache()
val graph = Graph(unpartitionedGraph.vertices,
// The above method is workaround for spark 1.0.0

 def partitionCustom[ED](edges: RDD[Edge[ED]]): RDD[Edge[ED]] = { => (customPartition(e.srcId, e.dstId), e))
        .partitionBy(new HashPartitioner(numPartitions))
        .mapPartitions(, preservesPartitioning = true)

def customPartition(src: VertexId, dst: VertexId): PartitionID = {
// search for the src and dest line in the loaded annotated file
// read the third element of that line and return it

But this method is inefficient as it requires to load the same data multiple
times and also slow as I am performing a large number of searches on really
huge edge list files.
Please suggest some efficient ways of doing this. Also please note that I am
stuck with spark 1.0.0 as I am only a user of the cluster available.

Arpit Kumar

View this message in context:
Sent from the Apache Spark User List mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message