I'm facing a very strange error that occurs halfway of long execution Spark SQL jobs:
18/01/12 22:14:30 ERROR Utils: Aborting task
java.io.EOFException: reached end of stream after reading 0 bytes; 96 bytes expected
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.sort_addToSorter$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
Since I get this in several jobs, I wonder if it might be a problem at the comm layer.
Did anyone face a similar problem?
It always happens in a job which does a shuffle of 200GB reading then in partitions of ~64MB for a groupBy. And it is weird that it only fails when it processed over 1000 partitions (16 cores on one node)
I even tried changing the spark.shuffle.file.buffer config but it just seems to change the point when it occurs.
Really would appreciate some hints - what it could be, what to try, test, how to debug - as I feel pretty much blocked here.
Thanks in advance