dropping user list, adding dev.


Thanks, Justin, for the poc. This is a good idea to explore, especially for Spark 2.0.


On Fri, May 22, 2015 at 12:08 PM, Justin Pihony <justin.pihony@gmail.com> wrote:
The (crude) proof of concept seems to work:

class RDD[V](value: List[V]){
  def doStuff = println("I'm doing stuff")
}

object RDD{
  implicit def toPair[V](x:RDD[V]) = new PairRDD(List((1,2)))
}

class PairRDD[K,V](value: List[(K,V)]) extends RDD (value){
  def doPairs = println("I'm using pairs")
}

class Context{
  def parallelize[K,V](x: List[(K,V)]) = new PairRDD(x)
  def parallelize[V](x: List[V]) = new RDD(x)
}

On Fri, May 22, 2015 at 2:44 PM, Reynold Xin <rxin@databricks.com> wrote:
I'm not sure if it is possible to overload the map function twice, once for just KV pairs, and another for K and V separately.


On Fri, May 22, 2015 at 10:26 AM, Justin Pihony <justin.pihony@gmail.com> wrote:
This ticket improved the RDD API, but it could be even more discoverable if made available via the API directly. I assume this was originally an omission that now needs to be kept for backwards compatibility, but would any of the repo owners be open to making this more discoverable to the point of API docs and tab completion (while keeping both binary and source compatibility)?


    class PairRDD extends RDD{
      ....pair methods
    }
  
    RDD{
      def map[K: ClassTag, V: ClassTag](f: T => (K,V)):PairRDD[K,V]
    }

As long as the implicits remain, then compatibility remains, but now it is explicit in the docs on how to get a PairRDD and in tab completion.

Thoughts?

Justin Pihony