spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhicharevich, Alex" <azhicharev...@ebay.com>
Subject GraphX partition problem
Date Thu, 22 May 2014 11:53:00 GMT
Hi,

I'm running a simple connected components code using GraphX (version 0.9.1)

My input comes from a HDFS text file partitioned to 400 parts. When I run the code on a single
part or a small number of files (like 20) the code runs fine. As soon as I'm trying to read
more files (more than 30) I'm getting an error and the job fails.
>From looking at the logs I see the following exception
                java.util.NoSuchElementException: End of stream
       at org.apache.spark.util.NextIterator.next(NextIterator.scala:83)
       at org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:29)
       at org.apache.spark.graphx.impl.RoutingTable$$anonfun$1.apply(RoutingTable.scala:52)
       at org.apache.spark.graphx.impl.RoutingTable$$anonfun$1.apply(RoutingTable.scala:51)
       at org.apache.spark.rdd.RDD$$anonfun$1.apply(RDD.scala:456)

>From searching the web, I see it's a known issue with GraphX
Here : https://github.com/apache/spark/pull/367
And here : https://github.com/apache/spark/pull/497

Are there some stable releases that include this fix? Should I clone the git repo and build
it myself? How would you advise me to deal with this issue

Thanks,
Alex




Mime
View raw message