spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shankar Koirala (Jira)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-32147) Spark: PartitionBy changing the columns value
Date Wed, 01 Jul 2020 10:17:00 GMT

     [ https://issues.apache.org/jira/browse/SPARK-32147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Shankar Koirala updated SPARK-32147:
------------------------------------
    Labels: spark  (was: )

> Spark: PartitionBy changing the columns value 
> ----------------------------------------------
>
>                 Key: SPARK-32147
>                 URL: https://issues.apache.org/jira/browse/SPARK-32147
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core, Spark Shell
>    Affects Versions: 3.0.0
>            Reporter: Shankar Koirala
>            Priority: Major
>              Labels: spark
>
> While saving dataframe as parquet or csv with partitionBy column having 'f' and 'd' with
numbers are changing the values.
> Below is the example 
> {code:java}
> scala> val df = Seq(
>  | ("9q", 1),
>  | ("3k", 2),
>  | ("6f", 3),
>  | ("7f", 4),
>  | ("7d", 5)
>  | ).toDF("value", "id")
> df: org.apache.spark.sql.DataFrame = [value: string, id: int]
> scala> df.show(false)
> +-----+---+
> |value|id |
> +-----+---+
> |  9q | 1 |
> |  3k | 2 |
> |  6f | 3 |
> |  7f | 4 |
> |  7d | 5 |
> +-----+---+
> scala> df.write.partitionBy("value").mode(SaveMode.Overwrite).parquet("tmp_parquet")
> scala> spark.read.parquet("tmp_parquet").show(false)
> +---+-----+
> |id |value|
> +---+-----+
> |5  | 7.0 |
> |3  | 6.0 |
> |2  | 3k  |
> |4  | 7.0 |
> |1  | 9q  |
> +---+-----+
> {code}
> Same with the other format too, Is this a bug or is it normal.
> Taken from [SO|[https://stackoverflow.com/questions/62671684/spark-incorrectly-intepret-partition-name-ending-with-d-or-f-as-number-when]]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message