Hi,

My inputs are about 72GB. I did a group by first then a repartition. However the repartition step is very slow. 

Inline image 1




Any ideas why GC takes 80% of the time?


Cheers,
--
Jianshi Huang

LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/