spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bruce Robbins (Jira)" <j...@apache.org>
Subject [jira] [Resolved] (SPARK-31598) LegacySimpleTimestampFormatter incorrectly interprets pre-Gregorian timestamps
Date Fri, 01 May 2020 16:00:00 GMT

     [ https://issues.apache.org/jira/browse/SPARK-31598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Bruce Robbins resolved SPARK-31598.
-----------------------------------
    Fix Version/s: 3.1.0
                   3.0.0
       Resolution: Fixed

> LegacySimpleTimestampFormatter incorrectly interprets pre-Gregorian timestamps
> ------------------------------------------------------------------------------
>
>                 Key: SPARK-31598
>                 URL: https://issues.apache.org/jira/browse/SPARK-31598
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.0.0, 3.1.0
>            Reporter: Bruce Robbins
>            Priority: Major
>             Fix For: 3.0.0, 3.1.0
>
>
> As per discussion with [~maxgekk]:
> {{LegacySimpleTimestampFormatter#parse}} misinterprets pre-Gregorian timestamps:
> {noformat}
> scala> sql("set spark.sql.legacy.timeParserPolicy=LEGACY")
> res0: org.apache.spark.sql.DataFrame = [key: string, value: string]
> scala> val df1 = Seq("0002-01-01 00:00:00", "1000-01-01 00:00:00", "1800-01-01 00:00:00").toDF("expected")
> df1: org.apache.spark.sql.DataFrame = [expected: string]
> scala> val df2 = df1.select('expected, to_timestamp('expected, "yyyy-MM-dd HH:mm:ss").as("actual"))
> df2: org.apache.spark.sql.DataFrame = [expected: string, actual: timestamp]
> scala> df2.show(truncate=false)
> +-------------------+-------------------+
> |expected           |actual             |
> +-------------------+-------------------+
> |0002-01-01 00:00:00|0001-12-30 00:00:00|
> |1000-01-01 00:00:00|1000-01-06 00:00:00|
> |1800-01-01 00:00:00|1800-01-01 00:00:00|
> +-------------------+-------------------+
> scala> 
> {noformat}
> Legacy timestamp parsing with JSON and CSV files is correct, so apparently {{LegacyFastTimestampFormatter}}
does not have this issue (need to double check).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message