spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ankur Dave <ankurd...@gmail.com>
Subject Re: GraphX graph partitioning strategy
Date Fri, 25 Jul 2014 19:49:41 GMT
Hi Larry,

GraphX's graph constructor leaves the edges in their original partitions by
default. To support arbitrary multipass graph partitioning, one idea is to
take advantage of that by partitioning the graph externally to GraphX
(though possibly using information from GraphX such as the degrees), then
pass the partitioned edges to GraphX.

For example, if you had an edge partitioning function that needed the full
triplet to assign a partition, you could do this as follows:

val unpartitionedGraph: Graph[Int, Int] = ...val numPartitions: Int = 128
def getTripletPartition(e: EdgeTriplet[Int, Int]): PartitionID = ...
// Get the triplets using GraphX, then use Spark to repartition
themval partitionedEdges = unpartitionedGraph.triplets
  .map(e => (getTripletPartition(e), e))
  .partitionBy(new HashPartitioner(numPartitions))
val partitionedGraph = Graph(unpartitionedGraph.vertices, partitionedEdges)


A multipass partitioning algorithm could store its results in the edge
attribute, and then you could use the code above to do the partitioning.

Ankur <http://www.ankurdave.com/>


On Wed, Jul 23, 2014 at 11:59 PM, Larry Xiao <xiaodi@sjtu.edu.cn> wrote:

> Hi all,
>
> I'm implementing graph partitioning strategy for GraphX, learning from
> researches on graph computing.
>
> I have two questions:
>
> - a specific implement question:
> In current design, only vertex ID of src and dst are provided
> (PartitionStrategy.scala).
> And some strategies require knowledge about the graph (like degrees) and
> can consist more than one passes to finally produce the partition ID.
> So I'm changing the PartitionStrategy.getPartition API to provide more
> info, but I don't want to make it complex. (the current one looks very
> clean)
>
> - an open question:
> What advice would you give considering partitioning, considering the
> procedure Spark adopt on graph processing?
>
> Any advice is much appreciated.
>
> Best Regards,
> Larry Xiao
>
> Reference
>
> Bipartite-oriented Distributed Graph Partitioning for Big Learning.
> PowerLyra : Differentiated Graph Computation and Partitioning on Skewed
> Graphs
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message