spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Pivovarov <apivova...@gmail.com>
Subject rdd.distinct with Partitioner
Date Thu, 09 Jun 2016 03:42:55 GMT
most of the RDD methods which shuffle data take Partitioner as a parameter

But rdd.distinct does not have such signature

Should I open a PR for that?

/**
 * Return a new RDD containing the distinct elements in this RDD.
 */

def distinct(partitioner: Partitioner)(implicit ord: Ordering[T] =
null): RDD[T] = withScope {
  map(x => (x, null)).reduceByKey(partitioner, (x, y) => x).map(_._1)
}

Mime
View raw message