spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maciej Bryński (JIRA) <j...@apache.org>
Subject [jira] [Created] (SPARK-17786) [SPARK 2.0] Sorting algorithm gives higher skewness of output
Date Wed, 05 Oct 2016 11:52:20 GMT
Maciej Bryński created SPARK-17786:
--------------------------------------

             Summary: [SPARK 2.0] Sorting algorithm gives higher skewness of output
                 Key: SPARK-17786
                 URL: https://issues.apache.org/jira/browse/SPARK-17786
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.0.1
            Reporter: Maciej Bryński


Hi,
I'm using df.sort("column") to sort my data before saving it to parquet.

When using Spark 1.6.2 all partitions were similar in size.
On Spark 2.0.0 three of the partitions are much bigger than rest.

Can I go back to previous behaviour of sorting ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message