spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Apache Spark (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-25902) Support for dates with milliseconds in Arrow bindings
Date Thu, 01 Nov 2018 01:22:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-25902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16671002#comment-16671002
] 

Apache Spark commented on SPARK-25902:
--------------------------------------

User 'kevinyu98' has created a pull request for this issue:
https://github.com/apache/spark/pull/22918

> Support for dates with milliseconds in Arrow bindings
> -----------------------------------------------------
>
>                 Key: SPARK-25902
>                 URL: https://issues.apache.org/jira/browse/SPARK-25902
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.3.2
>            Reporter: Javier Luraschi
>            Priority: Major
>
> Currently, the Apache Arrow bindings for Java only support `Date` with the metric set
to `DateUnit.DAY`, see [ArrowUtils.scala#L72|https://github.com/apache/spark/blob/8c2edf46d0f89e5ec54968218d89f30a3f8190bc/sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowUtils.scala#L72].
> However, the Spark Arrow bindings for `R` are adding support to map `POSIXct` to `Date`
using `DateUnit.MILLISECOND`, with the following code triggering the following warning:
>  
> {code:java}
> devtools::install_github("apache/arrow", subdir = "r")
> devtools::install_github("rstudio/sparklyr", ref = "feature/arrow")
> Sys.setenv("SPARK_HOME_VERSION" = "2.3.2")
> library(sparklyr)
> library(arrow)
> sc <- spark_connect(master = "local", spark_home = "<path-to-spark-sources>")
> dates <- data.frame(dates = c(
>  as.POSIXlt(Sys.time(), "GMT"),
>  as.POSIXlt(Sys.time(), "EST"))
> )
> dates_tbl <- sdf_copy_to(sc, dates, overwrite = T){code}
>  
>  
> {code:java}
> Arrow disabled due to columns: dates
> {code}
> Which means that Arrow serialization gets disabled due to the following Spark exception
being thrown:
> {code:java}
> java.lang.UnsupportedOperationException: Unsupported data type: Date(MILLISECOND)
> {code}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message