spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Punit Naik <naik.puni...@gmail.com>
Subject Re: repartitionAndSortWithinPartitions HELP
Date Thu, 14 Jul 2016 17:59:48 GMT
Hi Koert

I have already used "repartitionAndSortWithinPartitions" for secondary
sorting and it works fine. Just wanted to know whether it will sort the
entire RDD or not.

On Thu, Jul 14, 2016 at 11:25 PM, Koert Kuipers <koert@tresata.com> wrote:

> repartitionAndSortWithinPartit sort by keys, not values per key, so not
> really secondary sort by itself.
>
> for secondary sort also check out:
> https://github.com/tresata/spark-sorted
>
>
> On Thu, Jul 14, 2016 at 1:09 PM, Punit Naik <naik.punit44@gmail.com>
> wrote:
>
>> Hi guys
>>
>> In my spark/scala code I am implementing secondary sort. I wanted to
>> know, when I call the "repartitionAndSortWithinPartitions" method, the
>> whole (entire) RDD will be sorted or only the individual partitions will be
>> sorted?
>> If its the latter case, will applying a "sortByKey" after
>> "repartitionAndSortWithinPartitions" be faster now that the individual
>> partitions are sorted?
>>
>> --
>> Thank You
>>
>> Regards
>>
>> Punit Naik
>>
>
>


-- 
Thank You

Regards

Punit Naik

Mime
View raw message