spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matei Zaharia (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-3655) Secondary sort
Date Tue, 21 Oct 2014 01:55:36 GMT

    [ https://issues.apache.org/jira/browse/SPARK-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177824#comment-14177824
] 

Matei Zaharia commented on SPARK-3655:
--------------------------------------

I believe you can build this on top of sortByKey with mapPartitions. The values for each key
are guaranteed to go to the same node (though we should document that). Or are you looking
to partition the keys by one function and have the values sorted by another? In that case
we added this weird repartitionAndSortWithinPartitions function to OrderedRDDFunctions that
would do the trick (it was added to make it easier to port apps from MapReduce).

> Secondary sort
> --------------
>
>                 Key: SPARK-3655
>                 URL: https://issues.apache.org/jira/browse/SPARK-3655
>             Project: Spark
>          Issue Type: New Feature
>          Components: Spark Core
>    Affects Versions: 1.1.0
>            Reporter: koert kuipers
>            Priority: Minor
>
> Now that spark has a sort based shuffle, can we expect a secondary sort soon? There are
some use cases where getting a sorted iterator of values per key is helpful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message