spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Apache Spark (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-5763) Sort-based Groupby and Join to resolve skewed data
Date Tue, 24 Mar 2015 12:09:52 GMT

    [ https://issues.apache.org/jira/browse/SPARK-5763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14377763#comment-14377763
] 

Apache Spark commented on SPARK-5763:
-------------------------------------

User 'lianhuiwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/5168

> Sort-based Groupby and Join to resolve skewed data
> --------------------------------------------------
>
>                 Key: SPARK-5763
>                 URL: https://issues.apache.org/jira/browse/SPARK-5763
>             Project: Spark
>          Issue Type: Improvement
>          Components: Shuffle, Spark Core
>            Reporter: Lianhui Wang
>
> In SPARK-4644, it provide a way to resolve skewed data. But when we has more keys that
are skewed, I think that the way in SPARK-4644 is inappropriate. So we can use sort-merge
to resolve skewed-groupby and skewed-join.because SPARK-2926 implement merge-sort, we can
implement sort-merge for skewed based on SPARK-2926. And i have implemented sort-merge-groupby
and it is very well for skewed data in my test.Later i will implement sort-merge-join to resolve
skewed-join.
> [~rxin] [~sandyr] [~andrewor14] how about your opinions about this?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message