spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron Davidson (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SPARK-983) Support external sorting for RDD#sortByKey()
Date Tue, 27 May 2014 21:56:01 GMT

    [ https://issues.apache.org/jira/browse/SPARK-983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010190#comment-14010190
] 

Aaron Davidson edited comment on SPARK-983 at 5/27/14 9:54 PM:
---------------------------------------------------------------

Does sound reasonable. For some reason it does not allow me to assign the issue to you, though.

Edit: Figured it out, thanks [~pwendell]!


was (Author: ilikerps):
Does sound reasonable. For some reason it does not allow me to assign the issue to you, though.

> Support external sorting for RDD#sortByKey()
> --------------------------------------------
>
>                 Key: SPARK-983
>                 URL: https://issues.apache.org/jira/browse/SPARK-983
>             Project: Spark
>          Issue Type: New Feature
>    Affects Versions: 0.9.0
>            Reporter: Reynold Xin
>            Assignee: Madhu Siddalingaiah
>
> Currently, RDD#sortByKey() is implemented by a mapPartitions which creates a buffer to
hold the entire partition, then sorts it. This will cause an OOM if an entire partition cannot
fit in memory, which is especially problematic for skewed data. Rather than OOMing, the behavior
should be similar to the [ExternalAppendOnlyMap|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/collection/ExternalAppendOnlyMap.scala],
where we fallback to disk if we detect memory pressure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message