spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jaka JanĨar <>
Subject Re: partitioning via groupByKey
Date Wed, 19 Mar 2014 16:42:03 GMT
The former: a single new RDD is returned.

Check the PairRDDFunctions docs (

def groupByKey(): RDD[(K, Seq[V])]
Group the values for each key in the RDD into a single sequence.

On Wednesday, March 19, 2014 at 9:32 AM, Adrian Mocanu wrote:

> When you partition via groupByKey tulpes (parts of the RDD) are moved from some node
to another node based on key (hash partitioning).
> Do the tuples remain part of 1 RDD as before but moved to different nodes or does this
shuffling create, say, several RDDs which will have parts of the original RDD?
> Thanks
> -Adrian

View raw message