spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Oksana Romankova (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-17914) Spark SQL casting to TimestampType with nanosecond results in incorrect timestamp
Date Thu, 13 Oct 2016 19:42:21 GMT

    [ https://issues.apache.org/jira/browse/SPARK-17914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15572957#comment-15572957
] 

Oksana Romankova commented on SPARK-17914:
------------------------------------------

Sean, I can't find any evidence of ISO8601 not supporting nanoseconds. All it says that it
supports fraction of a second that should be supplied following comma or dot. Different parsing
libraries that support ISO8601 have different precision limitations. For instance in Python,
datetime.strptime() only supports precision down to microseconds and will throw an exception
if nanoseconds were supplied in input string. While it may not be ideal for those who need
to be able to retain nanosecond precision after parsing, it is an acceptable behavior. Those
who do not need to retain nanosecond precision can catch, or, preemptively, truncate input
string. Spark sql DateTimeUtils.stringToTimestamp() doesn't throw, and doesn't truncate properly,
which results in incorrect timestamp. In the example above, the acceptable truncation would
be:

```
"2016-05-14T15:12:14.0034567Z" -> "2016-05-14 15:12:14.003456"
"2016-05-14T15:12:14.000345678Z" -> "2016-05-14 15:12:14.000345"
```

> Spark SQL casting to TimestampType with nanosecond results in incorrect timestamp
> ---------------------------------------------------------------------------------
>
>                 Key: SPARK-17914
>                 URL: https://issues.apache.org/jira/browse/SPARK-17914
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.6.1
>            Reporter: Oksana Romankova
>
> In some cases when timestamps contain nanoseconds they will be parsed incorrectly. 
> Examples: 
> "2016-05-14T15:12:14.0034567Z" -> "2016-05-14 15:12:14.034567"
> "2016-05-14T15:12:14.000345678Z" -> "2016-05-14 15:12:14.345678"
> The issue seems to be happening in DateTimeUtils.stringToTimestamp(). It assumes that
only 6 digit fraction of a second will be passed.
> With this being the case I would suggest either discarding nanoseconds automatically,
or throw an exception prompting to pre-format timestamps to microsecond precision first before
casting to the Timestamp.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message