spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Schmidtke <ro.schmid...@gmail.com>
Subject StreamCorruptedException during deserialization
Date Tue, 29 Mar 2016 09:25:37 GMT
Hi everyone,

I'm running the Intel HiBench TeraSort (1TB) Spark Scala benchmark on Spark
1.6.0. After some time, I'm seeing one task fail too many times, despite
being rescheduled on different nodes with the following stacktrace:

16/03/27 22:25:04 WARN scheduler.TaskSetManager: Lost task 97.0 in stage
2.0 (TID 2337, <HOST>): java.io.StreamCorruptedException: invalid stream
header: 32313530
at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:806)
at java.io.ObjectInputStream.<init>(ObjectInputStream.java:299)
at
org.apache.spark.serializer.JavaDeserializationStream$$anon$1.<init>(JavaSerializer.scala:64)
at
org.apache.spark.serializer.JavaDeserializationStream.<init>(JavaSerializer.scala:64)
at
org.apache.spark.serializer.JavaSerializerInstance.deserializeStream(JavaSerializer.scala:123)
at
org.apache.spark.shuffle.BlockStoreShuffleReader$$anonfun$3.apply(BlockStoreShuffleReader.scala:64)
at
org.apache.spark.shuffle.BlockStoreShuffleReader$$anonfun$3.apply(BlockStoreShuffleReader.scala:60)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at
org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
at
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
at
org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:197)
at
org.apache.spark.shuffle.BlockStoreShuffleReader.read(BlockStoreShuffleReader.scala:103)
at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:98)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Is there a meaningful way for me to find out what exactly is going wrong
here? Any help and hints are greatly appreciated!

Thanks
Robert

-- 
My GPG Key ID: 336E2680

Mime
View raw message