spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andreas Költringer (JIRA) <j...@apache.org>
Subject [jira] [Updated] (SPARK-28515) to_timestamp returns null for summer time switch dates
Date Thu, 25 Jul 2019 11:29:00 GMT

     [ https://issues.apache.org/jira/browse/SPARK-28515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Andreas Költringer updated SPARK-28515:
---------------------------------------
    Description: 
I am not sure if this is a bug - but it was a very unexpected behavior, so I'd like some clarification.

When parsing datetime-strings, when the date-time in question falls into the range of a "summer
time switch" (e.g. in (most of) Europe, on 2015-03-29 at 2am the clock was forwarded to 3am),
the {{to_timestamp}} method returns {{NULL}}.

Minimal Example (using Python):

{{>>> df = spark.createDataFrame([('201503290159',), ('201503290200',)], ['date_str'])}}
 {{>>> df.withColumn('timestamp', F.to_timestamp('date_str', 'yyyyMMddhhmm')).show()}}
 {{+-------------+------------------+                                            
}}
 {{|    date_str|          timestamp|}}
 {{+-------------+------------------+}}
 {{|201503290159|2015-03-29 01:59:00|}}
 {{|201503290200|               null|}}
 {{+-------------+------------------+}}

A solution (or workaround) is to set the time zone for Spark to UTC:

{{spark.conf.set("spark.sql.session.timeZone", "UTC")}}

(see e.g. [https://stackoverflow.com/q/52594762)]

 

Plain Java does not do this, e.g. this works as expected:

{{SimpleDateFormat dateFormat = new SimpleDateFormat("yyyyMMddhhmm"); }}
{{Date parsedDate = dateFormat.parse("201503290201"); }}
{{Timestamp timestamp = new java.sql.Timestamp(parsedDate.getTime()); }}

 

So, is this really the intended behaviour? Is there documentation about this? THX.

  was:
I am not sure if this is a bug - but it was a very unexpected behavior, so I'd like some clarification.

When parsing datetime-strings, when the date-time in question falls into the range of a "summer
time switch" (e.g. in (most of) Europe, on 2015-03-29 at 2am the clock was forwarded to 3am),
the {{to_timestamp}} method returns {{NULL}}.

Minimal Example (using Python):

{{>>> df = spark.createDataFrame([('201503290159',), ('201503290200',)], ['date_str'])}}
 {{>>> df.withColumn('timestamp', F.to_timestamp('date_str', 'yyyyMMddhhmm')).show()}}
 {{+-------------+------------------+                                            
}}
 {{|    date_str|          timestamp|}}
 {{+-------------+------------------+}}
 {{|201503290159|2015-03-29 01:59:00|}}
 {{|201503290200|               null|}}
 {{+-------------+------------------+}}

A solution (or workaround) is to set the time zone for Spark to UTC:

{{spark.conf.set("spark.sql.session.timeZone", "UTC")}}

(see e.g. [https://stackoverflow.com/q/52594762)]

 

Plain Java does not do this, e.g. this works as expected:

{{SimpleDateFormat dateFormat = new SimpleDateFormat("yyyyMMddhhmm"); }}

{{Date parsedDate = dateFormat.parse("201503290201"); }}

{{Timestamp timestamp = new java.sql.Timestamp(parsedDate.getTime());}}

 

So, is this really the intended behaviour? Is there documentation about this? THX.


> to_timestamp returns null for summer time switch dates
> ------------------------------------------------------
>
>                 Key: SPARK-28515
>                 URL: https://issues.apache.org/jira/browse/SPARK-28515
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.4.3
>         Environment: Spark 2.4.3 on Linux 64bit, openjdk-8-jre-headless
>            Reporter: Andreas Költringer
>            Priority: Major
>
> I am not sure if this is a bug - but it was a very unexpected behavior, so I'd like some
clarification.
> When parsing datetime-strings, when the date-time in question falls into the range of
a "summer time switch" (e.g. in (most of) Europe, on 2015-03-29 at 2am the clock was forwarded
to 3am), the {{to_timestamp}} method returns {{NULL}}.
> Minimal Example (using Python):
> {{>>> df = spark.createDataFrame([('201503290159',), ('201503290200',)], ['date_str'])}}
>  {{>>> df.withColumn('timestamp', F.to_timestamp('date_str', 'yyyyMMddhhmm')).show()}}
>  {{+-------------+------------------+                                            
}}
>  {{|    date_str|          timestamp|}}
>  {{+-------------+------------------+}}
>  {{|201503290159|2015-03-29 01:59:00|}}
>  {{|201503290200|               null|}}
>  {{+-------------+------------------+}}
> A solution (or workaround) is to set the time zone for Spark to UTC:
> {{spark.conf.set("spark.sql.session.timeZone", "UTC")}}
> (see e.g. [https://stackoverflow.com/q/52594762)]
>  
> Plain Java does not do this, e.g. this works as expected:
> {{SimpleDateFormat dateFormat = new SimpleDateFormat("yyyyMMddhhmm"); }}
> {{Date parsedDate = dateFormat.parse("201503290201"); }}
> {{Timestamp timestamp = new java.sql.Timestamp(parsedDate.getTime()); }}
>  
> So, is this really the intended behaviour? Is there documentation about this? THX.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message