spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ankur Srivastava <>
Subject Would Join on PairRDD's result in co-locating data by keys?
Date Thu, 22 Jan 2015 17:44:16 GMT

I wanted to understand how the join on two pair rdd's work? Would it result
in shuffling data from both the RDD's with same key into same partition? If
that is the case would it be better to use partitionBy function to
partition (by the join attribute) the RDD at creation for lesser shuffling?



View raw message