I found the Spark method repartitionAndSortWithinPartitions spends twice as much time as using Mapreduce in some cases.
I want to repartition the dataset accorading to split keys and save them to files in ascending. As the doc says, repartitionAndSortWithinPartitions “is more efficient than calling `repartition` and then sorting within each partition because it can push the sorting down into the shuffle machinery.” I thought it may be faster than MR, but actually, it is much more slower. I also adjust several configurations of spark, but that doesn't work.(Both Spark and Mapreduce run on a three-node cluster and share the same number of partitions.)
Can this situation be explained or is there any approach to improve the performance of spark?
Thanks & Regards,