spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Apache Spark (JIRA)" <>
Subject [jira] [Commented] (SPARK-5763) Sort-based Groupby and Join to resolve skewed data
Date Tue, 24 Mar 2015 12:09:52 GMT


Apache Spark commented on SPARK-5763:

User 'lianhuiwang' has created a pull request for this issue:

> Sort-based Groupby and Join to resolve skewed data
> --------------------------------------------------
>                 Key: SPARK-5763
>                 URL:
>             Project: Spark
>          Issue Type: Improvement
>          Components: Shuffle, Spark Core
>            Reporter: Lianhui Wang
> In SPARK-4644, it provide a way to resolve skewed data. But when we has more keys that
are skewed, I think that the way in SPARK-4644 is inappropriate. So we can use sort-merge
to resolve skewed-groupby and skewed-join.because SPARK-2926 implement merge-sort, we can
implement sort-merge for skewed based on SPARK-2926. And i have implemented sort-merge-groupby
and it is very well for skewed data in my test.Later i will implement sort-merge-join to resolve
> [~rxin] [~sandyr] [~andrewor14] how about your opinions about this?

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message