spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lukas Nalezenec <>
Subject strange StreamCorruptedException
Date Thu, 17 Apr 2014 14:39:23 GMT
Hi all,

I am running algorithm similar to wordcount and I am not sure why it 
fails at end, there are only 200 words so result of the computation 
should be small.

I have got SIMR command line with Spark 0.8.1 , 50 workers each with 
~512M RAM.
The dataset is 100 GB tab separated text HadoopRDD, it has ~6000 partitions.

My command line is: => x.split("\t")).map(x => (x(2), 
x(3).toInt)).reduceByKey(_ + _).collect

It throws this exception: ( 
invalid type code: AC)$$anon$1.getNext(Serializer.scala:101)org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)scala.collection.Iterator$$anon$21.hasNext(Iterator.scala:440)org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:26)org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:27)org.apache.spark.Aggregator.combineCombinersByKey(Aggregator.scala:53)org.apache.spark.rdd.PairRDDFunctions$$anonfun$combineByKey$2.apply(PairRDDFunctions.scala:95)

What am I doing wrong ?

Best Regards

View raw message