spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alexandre Gattiker (Jira)" <j...@apache.org>
Subject [jira] [Comment Edited] (SPARK-17914) Spark SQL casting to TimestampType with nanosecond results in incorrect timestamp
Date Mon, 14 Oct 2019 11:37:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-17914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16950932#comment-16950932
] 

Alexandre Gattiker edited comment on SPARK-17914 at 10/14/19 11:36 AM:
-----------------------------------------------------------------------

As reported by other commenters, the issue is still outstanding with from_json in Spark 2.4.3
(Azure Databricks 5.5 LTS):

{{sc.parallelize(List("2019-10-14T{color:#00875a}09:39{color}:07.3220000Z")).toDF}}
{{ .select('value.cast("timestamp"))}}
{{ // 2019-10-14T{color:#00875a}09:39{color}:07.322+0000}}
{{// correct time parsing outside of from_json}}

{{val schema = StructType(StructField("a", TimestampType, false) :: Nil)}}
{{ sc.parallelize(List("""{"a":"2019-10-14T}}{color:#00875a}{{09:39}}{color}{{:07.3220000Z"}""")).toDF}}
{{ .select(from_json('value, schema))}}
{{ // {"a":"2019-10-14T{color:#de350b}10:32{color}:47.000+0000"}}}
{{ // wrong time, corresponds to 09:39+3220 seconds}}

{{val schema = StructType(StructField("a", TimestampType, false) :: Nil)}}
{{ sc.parallelize(List("""{"a":"2019-10-14T{color:#00875a}09:39{color}:322000Z"}""")).toDF}}
{{ .select(from_json('value, schema))}}
{{ // {"a":"2019-10-14T{color:#de350b}09:44{color}:29.000+0000"}}}
{{ // wrong time, corresponds to 09:39+322 seconds}}

{{ val schema = StructType(StructField("a", TimestampType, false) :: Nil)}}
{{ sc.parallelize(List("""{"a":"2019-10-14T{color:#00875a}09:39{color}:322000Z"}""")).toDF}}
{{ .select(from_json('value, schema))}}
{{ // {"a":"2019-10-14T{color:#00875a}09:39{color}:07.322+0000"}}}
{{ // correct time}}


was (Author: agattiker):
As reported by other commenters, the issue is still outstanding with from_json in Spark 2.4.3
(Azure Databricks 5.5 LTS):

{{sc.parallelize(List("2019-10-14T{color:#00875a}09:39{color}:07.3220000Z")).toDF}}
{{ .select('value.cast("timestamp"))}}
{{ // 2019-10-14T{color:#00875a}09:39{color}:07.322+0000}}
{{// correct time parsing outside of}}{{}}

{{val schema = StructType(StructField("a", TimestampType, false) :: Nil)}}
{{ sc.parallelize(List("""{"a":"2019-10-14T}}{color:#00875a}{{09:39}}{color}{{:07.3220000Z"}""")).toDF}}
{{ .select(from_json('value, schema))}}
{{ // {"a":"2019-10-14T{color:#de350b}10:32{color}:47.000+0000"}}}
{{ // wrong time, corresponds to 09:39+3220 seconds}}

{{val schema = StructType(StructField("a", TimestampType, false) :: Nil)}}
{{ sc.parallelize(List("""{"a":"2019-10-14T{color:#00875a}09:39{color}:322000Z"}""")).toDF}}
{{ .select(from_json('value, schema))}}
{{ // {"a":"2019-10-14T{color:#de350b}09:44{color}:29.000+0000"}}}
{{ // wrong time, corresponds to 09:39+322 seconds}}

{{ val schema = StructType(StructField("a", TimestampType, false) :: Nil)}}
{{ sc.parallelize(List("""{"a":"2019-10-14T{color:#00875a}09:39{color}:322000Z"}""")).toDF}}
{{ .select(from_json('value, schema))}}
{{ // {"a":"2019-10-14T{color:#00875a}09:39{color}:07.322+0000"}}}
{{ // correct time}}

> Spark SQL casting to TimestampType with nanosecond results in incorrect timestamp
> ---------------------------------------------------------------------------------
>
>                 Key: SPARK-17914
>                 URL: https://issues.apache.org/jira/browse/SPARK-17914
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.6.1
>            Reporter: Oksana Romankova
>            Assignee: Anton Okolnychyi
>            Priority: Major
>             Fix For: 2.2.0, 2.3.0
>
>
> In some cases when timestamps contain nanoseconds they will be parsed incorrectly. 
> Examples: 
> "2016-05-14T15:12:14.0034567Z" -> "2016-05-14 15:12:14.034567"
> "2016-05-14T15:12:14.000345678Z" -> "2016-05-14 15:12:14.345678"
> The issue seems to be happening in DateTimeUtils.stringToTimestamp(). It assumes that
only 6 digit fraction of a second will be passed.
> With this being the case I would suggest either discarding nanoseconds automatically,
or throw an exception prompting to pre-format timestamps to microsecond precision first before
casting to the Timestamp.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message