spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Apache Spark (Jira)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-31030) Backward Compatibility for Parsing and Formatting Datetime
Date Fri, 01 May 2020 23:56:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-31030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17097749#comment-17097749
] 

Apache Spark commented on SPARK-31030:
--------------------------------------

User 'dilipbiswal' has created a pull request for this issue:
https://github.com/apache/spark/pull/28433

> Backward Compatibility for Parsing and Formatting Datetime
> ----------------------------------------------------------
>
>                 Key: SPARK-31030
>                 URL: https://issues.apache.org/jira/browse/SPARK-31030
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Yuanjian Li
>            Assignee: Yuanjian Li
>            Priority: Major
>             Fix For: 3.0.0
>
>         Attachments: image-2020-03-04-10-54-05-208.png, image-2020-03-04-10-54-13-238.png
>
>
> *Background*
> In Spark version 2.4 and earlier, datetime parsing, formatting and conversion are performed
by using the hybrid calendar ([Julian + Gregorian|https://docs.oracle.com/javase/7/docs/api/java/util/GregorianCalendar.html]). 
> Since the Proleptic Gregorian calendar is de-facto calendar worldwide, as well as the
chosen one in ANSI SQL standard, Spark 3.0 switches to it by using Java 8 API classes (the
java.time packages that are based on [ISO chronology|https://docs.oracle.com/javase/8/docs/api/java/time/chrono/IsoChronology.html]
).
> The switching job is completed in SPARK-26651. 
>  
> *Problem*
> Switching to Java 8 datetime API breaks the backward compatibility of Spark 2.4 and earlier
when parsing datetime. Spark need its own patters definition on datetime parsing and formatting.
>  
> *Solution*
> To avoid unexpected result changes after the underlying datetime API switch, we propose
the following solution. 
>  * Introduce the fallback mechanism: when the Java 8-based parser fails, we need to detect
these behavior differences by falling back to the legacy parser, and fail with a user-friendly
error message to tell users what gets changed and how to fix the pattern.
>  * Document the Spark’s datetime patterns: The date-time formatter of Spark is decoupled
with the Java patterns. The Spark’s patterns are mainly based on the [Java 7’s pattern|https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html]
(for better backward compatibility) with the customized logic (caused by the breaking changes
between [Java 7|https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html]
and [Java 8|https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html]
pattern string). Below are the customized rules:
> ||Pattern||Java 7||Java 8|| Example||Rule||
> |u|Day number of week (1 = Monday, ..., 7 = Sunday)|Year (Different with y, u accept
a negative value to represent BC, while y should be used together with G to do the same thing.)|!image-2020-03-04-10-54-05-208.png!
 |Substitute ‘u’ to ‘e’ and use Java 8 parser to parse the string. If parsable, return
the result; otherwise, fall back to ‘u’, and then use the legacy Java 7 parser to parse.
When it is successfully parsed, throw an exception and ask users to change the pattern strings
or turn on the legacy mode; otherwise, return NULL as what Spark 2.4 does.|
> | z| General time zone which also accepts
>  [RFC 822 time zones|#rfc822timezone]]|Only accept time-zone name, e.g. Pacific Standard
Time; PST|!image-2020-03-04-10-54-13-238.png!  |The semantics of ‘z’ are different between
Java 7 and Java 8. Here, Spark 3.0 follows the semantics of Java 8. 
>  Use Java 8 to parse the string. If parsable, return the result; otherwise, use the legacy
Java 7 parser to parse. When it is successfully parsed, throw an exception and ask users to
change the pattern strings or turn on the legacy mode; otherwise, return NULL as what Spark
2.4 does.|
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message