spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 诺铁 <noty...@gmail.com>
Subject confused by reduceByKey usage
Date Thu, 17 Apr 2014 16:29:18 GMT
HI,

I am new to spark,when try to write some simple tests in spark shell, I met
following problem.

I create a very small text file,name it as 5.txt
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5

and experiment in spark shell:

scala> val d5 = sc.textFile("5.txt").cache()
d5: org.apache.spark.rdd.RDD[String] = MappedRDD[91] at textFile at
<console>:12

scala> d5.keyBy(_.split(" ")(0)).reduceByKey((v1,v2) => (v1.split("
")(1).toInt + v2.split(" ")(1).toInt).toString).first

then error occurs:
14/04/18 00:20:11 ERROR Executor: Exception in task ID 36
java.lang.ArrayIndexOutOfBoundsException: 1
at $line60.$read$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(<console>:15)
at $line60.$read$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(<console>:15)
at
org.apache.spark.util.collection.ExternalAppendOnlyMap$$anonfun$2.apply(ExternalAppendOnlyMap.scala:120)

when I delete 1 line in the file, and make it 2 lines,the result is
correct, I don't understand what's the problem, please help me,thanks.

Mime
View raw message