I’m running a simple connected components code using GraphX (version 0.9.1)
My input comes from a HDFS text file partitioned to 400 parts. When I run the code on a single part or a small number of files (like 20) the code runs fine. As soon as I’m trying to read more files (more than 30) I’m getting an error and the job fails.
From looking at the logs I see the following exception
java.util.NoSuchElementException: End of stream
From searching the web, I see it’s a known issue with GraphX
And here : https://github.com/apache/spark/pull/497
Are there some stable releases that include this fix? Should I clone the git repo and build it myself? How would you advise me to deal with this issue