spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ankur Dave <ankurd...@gmail.com>
Subject Re: GraphX partition problem
Date Thu, 22 May 2014 15:59:14 GMT
The fix will be included in Spark 1.0, but if you just want to apply the
fix to 0.9.1, here's a hotfixed version of 0.9.1 that only includes PR
#367: https://github.com/ankurdave/spark/tree/v0.9.1-handle-empty-partitions.
You can clone and build this.

Ankur <http://www.ankurdave.com/>


On Thu, May 22, 2014 at 4:53 AM, Zhicharevich, Alex
<azhicharevich@ebay.com>wrote:

>  Hi,
>
>
>
> I’m running a simple connected components code using GraphX (version 0.9.1)
>
>
>
> My input comes from a HDFS text file partitioned to 400 parts. When I run
> the code on a single part or a small number of files (like 20) the code
> runs fine. As soon as I’m trying to read more files (more than 30) I’m
> getting an error and the job fails.
>
> From looking at the logs I see the following exception
>
>                 java.util.NoSuchElementException: End of stream
>
>        at org.apache.spark.util.NextIterator.next(NextIterator.scala:83)
>
>        at
> org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:29)
>
>        at
> org.apache.spark.graphx.impl.RoutingTable$$anonfun$1.apply(RoutingTable.scala:52)
>
>        at
> org.apache.spark.graphx.impl.RoutingTable$$anonfun$1.apply(RoutingTable.scala:51)
>
>        at org.apache.spark.rdd.RDD$$anonfun$1.apply(RDD.scala:456)
>
>
>
> From searching the web, I see it’s a known issue with GraphX
>
> Here : https://github.com/apache/spark/pull/367
>
> And here : https://github.com/apache/spark/pull/497
>
>
>
> Are there some stable releases that include this fix? Should I clone the
> git repo and build it myself? How would you advise me to deal with this
> issue
>
>
>
> Thanks,
>
> Alex
>
>
>
>
>
>
>

Mime
View raw message