spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Koert Kuipers <ko...@tresata.com>
Subject Re: repartitionAndSortWithinPartitions HELP
Date Thu, 14 Jul 2016 17:55:55 GMT
repartitionAndSortWithinPartit sort by keys, not values per key, so not
really secondary sort by itself.

for secondary sort also check out:
https://github.com/tresata/spark-sorted


On Thu, Jul 14, 2016 at 1:09 PM, Punit Naik <naik.punit44@gmail.com> wrote:

> Hi guys
>
> In my spark/scala code I am implementing secondary sort. I wanted to know,
> when I call the "repartitionAndSortWithinPartitions" method, the whole
> (entire) RDD will be sorted or only the individual partitions will be
> sorted?
> If its the latter case, will applying a "sortByKey" after
> "repartitionAndSortWithinPartitions" be faster now that the individual
> partitions are sorted?
>
> --
> Thank You
>
> Regards
>
> Punit Naik
>

Mime
View raw message