I have seen that link. I am using RDD of Byte Array n Kryo serialization. Inside mapPartition when I measure time it is never more than 1 ms whereas total time took by application is like 30 min. Codebase has lot of dependencies. I m trying to come up with a simple version where I can reproduce this problem.
Also GC timings reported by spark ui is always in the range of 3~4%of total time.

Would be great if you can share the piece of code happening inside your mapPartition, I'm assuming you are creating/handling a lot of Complex objects and hence it slows down the performance. Here's a link to performance tuning if you haven't seen it already.

I have a spark app that involves series of mapPartition operations and then a keyBy operation. I have measured the time inside mapPartition function block. These blocks take trivial time. Still the application takes way too much time and even sparkUI shows that much time.
So i was wondering where does it take time and how can I reduce this.