spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rajat Kumar <rajatkumar10...@gmail.com>
Subject spark rdd grouping
Date Tue, 01 Dec 2015 01:46:10 GMT
Hi

i have a javaPairRdd<K,V> rdd1. i want to group by rdd1 by keys but
preserve the partitions of original rdd only to avoid shuffle since I know
all same keys are already in same partition.

PairRdd is basically constrcuted using kafka streaming low level consumer
which have all records with same key already in same partition. Can i group
them together with avoid shuffle.

Thanks

Mime
View raw message