spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roman Sokolov <ole...@gmail.com>
Subject Re: assertion failed error with GraphX
Date Wed, 22 Jul 2015 23:33:30 GMT
I am also having problems with triangle count - seems like this algorithm
is very memory consuming (I could not process even small graphs ~ 5 million
Vertices and 70 million Edges with less the 32 GB RAM on EACH machine).
What if I have graphs with billion edges, what amount of RAM do I need then?

So now I am trying to understand how it works and rewrite it maybe. I would
like to process big graphs with not so much RAM on each machine.
Am 20.07.2015 04:27 schrieb "Jack Yang" <jiey@uow.edu.au>:

>  Hi there,
>
>
>
> I got an error when running one simple graphX program.
>
> My setting is: spark 1.4.0, Hadoop yarn 2.5. scala 2.10. with four virtual
> machines.
>
>
>
> if I constructed one small graph (6 nodes, 4 edges), I run:
>
> println("triangleCount: %s ".format(
> hdfs_graph.triangleCount().vertices.count() ))
>
> that returns me the correct results.
>
>
>
> But I import a much larger graph (with 850000 nodes, 5000000 edges), the
> error is
>
> 15/07/20 12:03:36 WARN scheduler.TaskSetManager: Lost task 2.0 in stage
> 11.0 (TID 32, 192.168.157.131): java.lang.AssertionError: assertion failed
>
>         at scala.Predef$.assert(Predef.scala:165)
>
>         at
> org.apache.spark.graphx.lib.TriangleCount$$anonfun$7.apply(TriangleCount.scala:90)
>
>         at
> org.apache.spark.graphx.lib.TriangleCount$$anonfun$7.apply(TriangleCount.scala:87)
>
>         at
> org.apache.spark.graphx.impl.VertexPartitionBaseOps.leftJoin(VertexPartitionBaseOps.scala:140)
>
>         at
> org.apache.spark.graphx.impl.VertexRDDImpl$$anonfun$3.apply(VertexRDDImpl.scala:159)
>
>         at
> org.apache.spark.graphx.impl.VertexRDDImpl$$anonfun$3.apply(VertexRDDImpl.scala:156)
>
>         at
> org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:88)
>
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>
>
>
>
>
> I run the above two graphs using the same submit command:
>
> spark-submit --class "sparkUI.GraphApp" --master spark://master:7077
> --executor-memory 2G  --total-executor-cores 4 myjar.jar
>
>
>
> any thought? anything wrong with my machine or configuration?
>
>
>
>
>
>
>
>
>
> Best regards,
>
> Jack
>
>
>

Mime
View raw message