spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Owen (JIRA)" <j...@apache.org>
Subject [jira] [Assigned] (SPARK-19342) Datatype tImestamp is converted to numeric in collect method
Date Mon, 13 Feb 2017 14:21:41 GMT

     [ https://issues.apache.org/jira/browse/SPARK-19342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sean Owen reassigned SPARK-19342:
---------------------------------

    Assignee: Fangzhou Yang

> Datatype tImestamp is converted to numeric in collect method 
> -------------------------------------------------------------
>
>                 Key: SPARK-19342
>                 URL: https://issues.apache.org/jira/browse/SPARK-19342
>             Project: Spark
>          Issue Type: Bug
>          Components: SparkR
>    Affects Versions: 2.1.0
>            Reporter: Fangzhou Yang
>            Assignee: Fangzhou Yang
>             Fix For: 2.1.1, 2.2.0
>
>
> Get double instead of POSIX in collect method for timestamp column datatype, when NA
exists at the top of the column.
> The following codes and outputs show that, how the bug can be reproduced:
> {code}
> > sparkR.session(master = "local")
> Spark package found in SPARK_HOME: /home/titicaca/spark-2.1
> Launching java with spark-submit command /home/titicaca/spark-2.1/bin/spark-submit  
sparkr-shell /tmp/RtmpqmpZUg/backend_port363a898be92 
> Java ref type org.apache.spark.sql.SparkSession id 1 
> > df <- data.frame(col1 = c(0, 1, 2), 
> +                  col2 = c(as.POSIXct("2017-01-01 00:00:01"), NA, as.POSIXct("2017-01-01
12:00:01")))
> > sdf1 <- createDataFrame(df)
> > print(dtypes(sdf1))
> [[1]]
> [1] "col1"   "double"
> [[2]]
> [1] "col2"      "timestamp"
> > df1 <- collect(sdf1)
> > print(lapply(df1, class))
> $col1
> [1] "numeric"
> $col2
> [1] "POSIXct" "POSIXt" 
> > sdf2 <- filter(sdf1, "col1 > 0")
> > print(dtypes(sdf2))
> [[1]]
> [1] "col1"   "double"
> [[2]]
> [1] "col2"      "timestamp"
> > df2 <- collect(sdf2)
> > print(lapply(df2, class))
> $col1
> [1] "numeric"
> $col2
> [1] "numeric"
> {code}
> As we can see, the data type of col2 is converted to numberic unexpectedly in the collected
local data frame df2



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message