spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 周浥尘 <zhouy...@gmail.com>
Subject Why repartitionAndSortWithinPartitions slower than MapReducer
Date Mon, 20 Aug 2018 12:52:57 GMT
Hi team,

I found the Spark method *repartitionAndSortWithinPartitions *spends twice
as much time as using Mapreduce in some cases.
I want to repartition the dataset accorading to split keys and save them to
files in ascending. As the doc says, repartitionAndSortWithinPartitions “is
more efficient than calling `repartition` and then sorting within each
partition because it can push the sorting down into the shuffle machinery.”
I thought it may be faster than MR, but actually, it is much more slower. I
also adjust several configurations of spark, but that doesn't work.(Both
Spark and Mapreduce run on a three-node cluster and share the same number
of partitions.)
Can this situation be explained or is there any approach to improve the
performance of spark?

Thanks & Regards,
Yichen

Mime
View raw message