[ https://issues.apache.org/jira/browse/SPARK-15613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dmitry Bushev updated SPARK-15613:
----------------------------------
Description:
There is an issue with {{DateTimeUtils.daysToMillis}} implementation. It affects {{DateTimeUtils.toJavaDate}}
and ultimately CatalystTypeConverter, i.e the conversion of date stored as {{Int}} days from
epoch in InternalRow to {{java.sql.Date}} of Row returned to user.
The issue can be reproduced with this test (all the following tests are in my defalut timezone
Europe/Moscow):
{code}
scala> for (days <- 0 to 20000 if millisToDays(daysToMillis(days)) != days) yield days
res23: scala.collection.immutable.IndexedSeq[Int] = Vector(4108, 4473, 4838, 5204, 5568, 5932,
6296, 6660, 7024, 7388, 8053, 8487, 8851, 9215, 9586, 9950, 10314, 10678, 11042, 11406, 11777,
12141, 12505, 12869, 13233, 13597, 13968, 14332, 14696, 15060)
{code}
For example, for {{4108}} day of epoch, the correct date should be {{1981-04-01}}
{code}
scala> DateTimeUtils.toJavaDate(4107)
res25: java.sql.Date = 1981-03-31
scala> DateTimeUtils.toJavaDate(4108)
res26: java.sql.Date = 1981-03-31
scala> DateTimeUtils.toJavaDate(4109)
res27: java.sql.Date = 1981-04-02
{code}
There was previous unsuccessful attempt to work around the problem in SPARK-11415. It seems
that issue involves flaws in java date implementation and I don't see how it can be fixed
without third-party libraries.
I was not able to identify the library of choice for Spark. The following implementation uses
[JSR-310|http://www.threeten.org/]
{code}
def millisToDays(millisUtc: Long): SQLDate = {
val instant = Instant.ofEpochMilli(millisUtc)
val zonedDateTime = instant.atZone(ZoneId.systemDefault)
zonedDateTime.toLocalDate.toEpochDay.toInt
}
def daysToMillis(days: SQLDate): Long = {
val localDate = LocalDate.ofEpochDay(days)
val zonedDateTime = localDate.atStartOfDay(ZoneId.systemDefault)
zonedDateTime.toInstant.toEpochMilli
}
{code}
that produces correct results:
{code}
scala> for (days <- 0 to 20000 if millisToDays(daysToMillis(days)) != days) yield days
res37: scala.collection.immutable.IndexedSeq[Int] = Vector()
scala> new java.sql.Date(daysToMillis(4108))
res36: java.sql.Date = 1981-04-01
{code}
was:
There is an issue with {{DateTimeUtils.daysToMillis}} implementation. It affects {{DateTimeUtils.toJavaDate}}
and ultimately CatalystTypeConverter, i.e the conversion of date stored as {{Int}} days from
epoch in InternalRow to {{java.sql.Date}} of Row returned to user.
The issue can be reproduced with this test (all the following tests are in my defalut timezone
Europe/Moscow):
{code}
scala> for (days <- 0 to 20000 if millisToDays(daysToMillis(days)) != days) yield days
res23: scala.collection.immutable.IndexedSeq[Int] = Vector(4108, 4473, 4838, 5204, 5568, 5932,
6296, 6660, 7024, 7388, 8053, 8487, 8851, 9215, 9586, 9950, 10314, 10678, 11042, 11406, 11777,
12141, 12505, 12869, 13233, 13597, 13968, 14332, 14696, 15060)
{code}
For example, for {{4108}} day of epoch, the correct date should be {{1981-04-01}}
{code}
scala> DateTimeUtils.toJavaDate(4107)
res25: java.sql.Date = 1981-03-31
scala> DateTimeUtils.toJavaDate(4108)
res26: java.sql.Date = 1981-03-31
scala> DateTimeUtils.toJavaDate(4109)
res27: java.sql.Date = 1981-04-02
{code}
There was previous unsuccessful attempt to work around the problem in SPARK-11415. It seems
that issue involves flaws in java date implementation and I don't see how it can be fixed
without third-party libraries.
I was not able to identify the library of choice for Spark. This is one possible implementation
using [JSR-310|http://www.threeten.org/]
{code}
def millisToDays(millisUtc: Long): SQLDate = {
val instant = Instant.ofEpochMilli(millisUtc)
val zonedDateTime = instant.atZone(ZoneId.systemDefault)
zonedDateTime.toLocalDate.toEpochDay.toInt
}
def daysToMillis(days: SQLDate): Long = {
val localDate = LocalDate.ofEpochDay(days)
val zonedDateTime = localDate.atStartOfDay(ZoneId.systemDefault)
zonedDateTime.toInstant.toEpochMilli
}
{code}
that produces correct results:
{code}
scala> for (days <- 0 to 20000 if millisToDays(daysToMillis(days)) != days) yield days
res37: scala.collection.immutable.IndexedSeq[Int] = Vector()
scala> new java.sql.Date(daysToMillis(4108))
res36: java.sql.Date = 1981-04-01
{code}
> Incorrect days to millis conversion
> ------------------------------------
>
> Key: SPARK-15613
> URL: https://issues.apache.org/jira/browse/SPARK-15613
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 1.6.0
> Environment: java version "1.8.0_91"
> Reporter: Dmitry Bushev
>
> There is an issue with {{DateTimeUtils.daysToMillis}} implementation. It affects {{DateTimeUtils.toJavaDate}}
and ultimately CatalystTypeConverter, i.e the conversion of date stored as {{Int}} days from
epoch in InternalRow to {{java.sql.Date}} of Row returned to user.
>
> The issue can be reproduced with this test (all the following tests are in my defalut
timezone Europe/Moscow):
> {code}
> scala> for (days <- 0 to 20000 if millisToDays(daysToMillis(days)) != days) yield
days
> res23: scala.collection.immutable.IndexedSeq[Int] = Vector(4108, 4473, 4838, 5204, 5568,
5932, 6296, 6660, 7024, 7388, 8053, 8487, 8851, 9215, 9586, 9950, 10314, 10678, 11042, 11406,
11777, 12141, 12505, 12869, 13233, 13597, 13968, 14332, 14696, 15060)
> {code}
> For example, for {{4108}} day of epoch, the correct date should be {{1981-04-01}}
> {code}
> scala> DateTimeUtils.toJavaDate(4107)
> res25: java.sql.Date = 1981-03-31
> scala> DateTimeUtils.toJavaDate(4108)
> res26: java.sql.Date = 1981-03-31
> scala> DateTimeUtils.toJavaDate(4109)
> res27: java.sql.Date = 1981-04-02
> {code}
> There was previous unsuccessful attempt to work around the problem in SPARK-11415. It
seems that issue involves flaws in java date implementation and I don't see how it can be
fixed without third-party libraries.
> I was not able to identify the library of choice for Spark. The following implementation
uses [JSR-310|http://www.threeten.org/]
> {code}
> def millisToDays(millisUtc: Long): SQLDate = {
> val instant = Instant.ofEpochMilli(millisUtc)
> val zonedDateTime = instant.atZone(ZoneId.systemDefault)
> zonedDateTime.toLocalDate.toEpochDay.toInt
> }
> def daysToMillis(days: SQLDate): Long = {
> val localDate = LocalDate.ofEpochDay(days)
> val zonedDateTime = localDate.atStartOfDay(ZoneId.systemDefault)
> zonedDateTime.toInstant.toEpochMilli
> }
> {code}
> that produces correct results:
> {code}
> scala> for (days <- 0 to 20000 if millisToDays(daysToMillis(days)) != days) yield
days
> res37: scala.collection.immutable.IndexedSeq[Int] = Vector()
> scala> new java.sql.Date(daysToMillis(4108))
> res36: java.sql.Date = 1981-04-01
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org
|