spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Darabos (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-1329) ArrayIndexOutOfBoundsException if graphx.Graph has more edge partitions than node partitions
Date Mon, 26 May 2014 11:40:01 GMT

    [ https://issues.apache.org/jira/browse/SPARK-1329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14008771#comment-14008771
] 

Daniel Darabos commented on SPARK-1329:
---------------------------------------

Sorry, I haven't looked into fixing this. We ended up not using GraphX, so we are no longer
affected by the bug.

> ArrayIndexOutOfBoundsException if graphx.Graph has more edge partitions than node partitions
> --------------------------------------------------------------------------------------------
>
>                 Key: SPARK-1329
>                 URL: https://issues.apache.org/jira/browse/SPARK-1329
>             Project: Spark
>          Issue Type: Bug
>          Components: GraphX
>    Affects Versions: 0.9.0
>            Reporter: Daniel Darabos
>
> To reproduce, let's look at a graph with two nodes in one partition, and two edges between
them split across two partitions:
> scala> val vs = sc.makeRDD(Seq(1L->null, 2L->null), 1)
> scala> val es = sc.makeRDD(Seq(graphx.Edge(1, 2, null), graphx.Edge(2, 1, null)),
2)
> scala> val g = graphx.Graph(vs, es)
> Everything seems fine, until GraphX needs to join the two RDDs:
> scala> g.triplets.collect
> [...]
> java.lang.ArrayIndexOutOfBoundsException: 1
> 	at org.apache.spark.graphx.impl.RoutingTable$$anonfun$2$$anonfun$apply$3.apply(RoutingTable.scala:76)
> 	at org.apache.spark.graphx.impl.RoutingTable$$anonfun$2$$anonfun$apply$3.apply(RoutingTable.scala:75)
> 	at scala.collection.Iterator$class.foreach(Iterator.scala:727)
> 	at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
> 	at org.apache.spark.graphx.impl.RoutingTable$$anonfun$2.apply(RoutingTable.scala:75)
> 	at org.apache.spark.graphx.impl.RoutingTable$$anonfun$2.apply(RoutingTable.scala:73)
> 	at org.apache.spark.rdd.RDD$$anonfun$1.apply(RDD.scala:450)
> 	at org.apache.spark.rdd.RDD$$anonfun$1.apply(RDD.scala:450)
> 	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:34)
> 	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
> 	at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:71)
> 	at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
> 	at org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:85)
> 	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
> 	at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
> 	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:161)
> 	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:102)
> 	at org.apache.spark.scheduler.Task.run(Task.scala:53)
> 	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:213)
> 	at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:49)
> 	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> 	at java.lang.Thread.run(Thread.java:744)
> The bug is fairly obvious in RoutingTable.createPid2Vid() -- it creates an array of length
vertices.partitions.size, and then looks up partition IDs from the edges.partitionsRDD in
it.
> A graph usually has more edges than nodes. So it is natural to have more edge partitions
than node partitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message