spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Philip Ogren <>
Subject Re: Where is reduceByKey?
Date Thu, 07 Nov 2013 21:15:30 GMT
I remember running into something very similar when trying to perform a 
foreach on java.util.List and I fixed it by adding the following import:

import scala.collection.JavaConversions._

And my foreach loop magically compiled - presumably due to a another 
implicit conversion.  Now this is the second time I've run into this 
problem and I didn't recognize it.  I'm not sure that I would know what 
to do the next time I run into this.  Do you have some advice on how I 
should have recognized a missing import that provides implicit 
conversions and how I would know what to import?  This strikes me as 
code obfuscation.  I guess this is more of a Scala question....


On 11/7/2013 2:01 PM, Josh Rosen wrote:
> The additional methods on RDDs of pairs are defined in a class called 
> PairRDDFunctions 
> (

>  SparkContext provides an implicit conversion from RDD[T] to 
> PairRDDFunctions[T] to make this transparent to users.
> To import those implicit conversions, use
>     import org.apache.spark.SparkContext._
> These conversions are automatically imported by Spark Shell, but 
> you'll have to import them yourself in standalone programs.
> On Thu, Nov 7, 2013 at 11:54 AM, Philip Ogren < 
> <>> wrote:
>     On the front page <> of the
>     Spark website there is the following simple word count implementation:
>     file = spark.textFile("hdfs://...")
>     file.flatMap(line => line.split(" ")).map(word => (word,
>     1)).reduceByKey(_ + _)
>     The same code can be found in the Quick Start
>     <>
>     quide.  When I follow the steps in my spark-shell (version 0.8.0)
>     it works fine.  The reduceByKey method is also shown in the list
>     of transformations
>     <>
>     in the Spark Programming Guide.  The bottom of this list directs
>     the reader to the API docs for the class RDD (this link is broken,
>     BTW).  The API docs for RDD
>     <>
>     does not list a reduceByKey method for RDD.  Also, when I try to
>     compile the above code in a Scala class definition I get the
>     following compile error:
>     value reduceByKey is not a member of
>     org.apache.spark.rdd.RDD[(java.lang.String, Int)]
>     I am compiling with maven using the following dependency definition:
>             <dependency>
>     <groupId>org.apache.spark</groupId>
>     <artifactId>spark-core_2.9.3</artifactId>
>     <version>0.8.0-incubating</version>
>             </dependency>
>     Can someone help me understand why this code works fine from the
>     spark-shell but doesn't seem to exist in the API docs and won't
>     compile?
>     Thanks,
>     Philip

View raw message