The additional methods on RDDs of pairs are defined in a class called PairRDDFunctions (https://spark.incubator.apache.org/docs/latest/api/core/index.html#org.apache.spark.rdd.PairRDDFunctions). SparkContext provides an implicit conversion from RDD[T] to PairRDDFunctions[T] to make this transparent to users.
To import those implicit conversions, use
These conversions are automatically imported by Spark Shell, but you'll have to import them yourself in standalone programs.
On Thu, Nov 7, 2013 at 11:54 AM, Philip Ogren <firstname.lastname@example.org> wrote:
On the front page of the Spark website there is the following simple word count implementation:
file = spark.textFile("hdfs://...")
file.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey(_ + _)
The same code can be found in the Quick Start quide. When I follow the steps in my spark-shell (version 0.8.0) it works fine. The reduceByKey method is also shown in the list of transformations in the Spark Programming Guide. The bottom of this list directs the reader to the API docs for the class RDD (this link is broken, BTW). The API docs for RDD does not list a reduceByKey method for RDD. Also, when I try to compile the above code in a Scala class definition I get the following compile error:
value reduceByKey is not a member of org.apache.spark.rdd.RDD[(java.lang.String, Int)]
I am compiling with maven using the following dependency definition:
Can someone help me understand why this code works fine from the spark-shell but doesn't seem to exist in the API docs and won't compile?