spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Burak Yavuz (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-3280) Made sort-based shuffle the default implementation
Date Fri, 29 Aug 2014 05:09:08 GMT

    [ https://issues.apache.org/jira/browse/SPARK-3280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114873#comment-14114873
] 

Burak Yavuz commented on SPARK-3280:
------------------------------------

I don't have as detailed a comparison like Josh has, but for MLlib algorithms, sort based
shuffle didn't show the performance boosts Josh has shown. 16 m3.2xlarge instances were used
for these experiments. The difference here is that the number of partitions I used were 128.
Much less than the number of partitions Josh has shown.

!hash-sort-comp.png!

> Made sort-based shuffle the default implementation
> --------------------------------------------------
>
>                 Key: SPARK-3280
>                 URL: https://issues.apache.org/jira/browse/SPARK-3280
>             Project: Spark
>          Issue Type: Improvement
>            Reporter: Reynold Xin
>            Assignee: Reynold Xin
>         Attachments: hash-sort-comp.png
>
>
> sort-based shuffle has lower memory usage and seems to outperform hash-based in almost
all of our testing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message