spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Philip Ogren <>
Subject Re: Where is reduceByKey?
Date Thu, 07 Nov 2013 23:46:25 GMT
Thanks - I think this would be a helpful note to add to the docs.  I 
went and read a few things about Scala implicit conversions (I'm 
obviously new to the language) and it seems like a very powerful 
language feature and now that I know about them it will certainly be 
easy to identify when they are missing (i.e. the first thing to suspect 
when you see a "not a member" compilation message.)  I'm still a bit 
mystified as to how you would go about finding the appropriate imports 
except that I suppose you aren't very likely to use methods that you 
don't already know about!  Unless you are copying code verbatim that 
doesn't have the necessary import statements....

On 11/7/2013 4:05 PM, Matei Zaharia wrote:
> Yeah, this is confusing and unfortunately as far as I know it’s API 
> specific. Maybe we should add this to the documentation page for RDD.
> The reason for these conversions is to only allow some operations 
> based on the underlying data type of the collection. For example, 
> Scala collections support sum() as long as they contain numeric types. 
> That’s fine for the Scala collection library since its conversions are 
> imported by default, but I guess it makes it confusing for third-party 
> apps.
> Matei
> On Nov 7, 2013, at 1:15 PM, Philip Ogren < 
> <>> wrote:
>> I remember running into something very similar when trying to perform 
>> a foreach on java.util.List and I fixed it by adding the following 
>> import:
>> import scala.collection.JavaConversions._
>> And my foreach loop magically compiled - presumably due to a another 
>> implicit conversion.  Now this is the second time I've run into this 
>> problem and I didn't recognize it.  I'm not sure that I would know 
>> what to do the next time I run into this.  Do you have some advice on 
>> how I should have recognized a missing import that provides implicit 
>> conversions and how I would know what to import?  This strikes me as 
>> code obfuscation.  I guess this is more of a Scala question....
>> Thanks,
>> Philip
>> On 11/7/2013 2:01 PM, Josh Rosen wrote:
>>> The additional methods on RDDs of pairs are defined in a class 
>>> called PairRDDFunctions 
>>> (

>>>  SparkContext provides an implicit conversion from RDD[T] to 
>>> PairRDDFunctions[T] to make this transparent to users.
>>> To import those implicit conversions, use
>>>     import org.apache.spark.SparkContext._
>>> These conversions are automatically imported by Spark Shell, but 
>>> you'll have to import them yourself in standalone programs.
>>> On Thu, Nov 7, 2013 at 11:54 AM, Philip Ogren 
>>> < <>> wrote:
>>>     On the front page <> of the
>>>     Spark website there is the following simple word count
>>>     implementation:
>>>     file = spark.textFile("hdfs://...")
>>>     file.flatMap(line => line.split(" ")).map(word => (word,
>>>     1)).reduceByKey(_ + _)
>>>     The same code can be found in the Quick Start
>>>     <>
>>>     quide.  When I follow the steps in my spark-shell (version
>>>     0.8.0) it works fine.  The reduceByKey method is also shown in
>>>     the list of transformations
>>>     <>
>>>     in the Spark Programming Guide.  The bottom of this list directs
>>>     the reader to the API docs for the class RDD (this link is
>>>     broken, BTW). The API docs for RDD
>>>     <>
>>>     does not list a reduceByKey method for RDD.  Also, when I try to
>>>     compile the above code in a Scala class definition I get the
>>>     following compile error:
>>>     value reduceByKey is not a member of
>>>     org.apache.spark.rdd.RDD[(java.lang.String, Int)]
>>>     I am compiling with maven using the following dependency definition:
>>>             <dependency>
>>>     <groupId>org.apache.spark</groupId>
>>>     <artifactId>spark-core_2.9.3</artifactId>
>>>     <version>0.8.0-incubating</version>
>>>             </dependency>
>>>     Can someone help me understand why this code works fine from the
>>>     spark-shell but doesn't seem to exist in the API docs and won't
>>>     compile?
>>>     Thanks,
>>>     Philip

View raw message