I am wondering if it is possible to get the partition position of cached RDD? I am asking this because I am trying to avoid shuffling when performing coalesce operation. And the size of my partitions could be quite imbalance thus CoalescedRDD would probably not be a good solution in my case.

Thank you!


Wenlei Xie (谢文磊)

Department of Computer Science
5132 Upson Hall, Cornell University
Ithaca, NY 14853, USA
Phone: (607) 255-5577
Email: wenlei.xie@gmail.com