spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David V. Hill (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-25467) Python date/datetime objects in dataframes increment by 1 day when converted to JSON
Date Wed, 19 Sep 2018 15:09:00 GMT

     [ https://issues.apache.org/jira/browse/SPARK-25467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

David V. Hill updated SPARK-25467:
----------------------------------
    Description: 
When Dataframes contains datetime.date or datetime.datetime instances and toJSON() is called
on the Dataframe, the day is incremented in the JSON date representation.
{code}
# Create a Dataframe containing datetime.date instances, convert to JSON and display
rows = [Row(cx=1, cy=2, dates=[datetime.date.fromordinal(1), datetime.date.fromordinal(2)])]

df = sqc.createDataFrame(rows)

df.collect()
[Row(cx=1, cy=2, dates=[datetime.date(1, 1, 1), datetime.date(1, 1, 2)])]

df.toJSON().collect()
['{"cx":1,"cy":2,"dates":["0001-01-03","0001-01-04"]}']


# Issue also occurs with datetime.datetime instances

rows = [Row(cx=1, cy=2, dates=[datetime.datetime.fromordinal(1), datetime.datetime.fromordinal(2)])]

df = sqc.createDataFrame(rows)

df.collect()
[Row(cx=1, cy=2, dates=[datetime.datetime(1, 1, 1, 0, 0, fold=1), datetime.datetime(1, 1,
2, 0, 0)])]

df.toJSON().collect()
['{"cx":1,"cy":2,"dates":["0001-01-02T23:50:36.000-06:00","0001-01-03T23:50:36.000-06:00"]}']

{code}
 

 

  was:
When Dataframes contains datetime.date or datetime.datetime objects and toJSON() is called
on the Dataframe, the day is incremented in the JSON date representation.

{code:python}
# Create a Dataframe containing datetime.date instances, convert to JSON and display
rows = [Row(cx=1, cy=2, dates=[datetime.date.fromordinal(1), datetime.date.fromordinal(2)])]

df = sqc.createDataFrame(rows)

df.collect()
[Row(cx=1, cy=2, dates=[datetime.date(1, 1, 1), datetime.date(1, 1, 2)])]

df.toJSON().collect()
['{"cx":1,"cy":2,"dates":["0001-01-03","0001-01-04"]}']


# Issue also occurs with datetime.datetime instances

rows = [Row(cx=1, cy=2, dates=[datetime.datetime.fromordinal(1), datetime.datetime.fromordinal(2)])]

df = sqc.createDataFrame(rows)

df.collect()
[Row(cx=1, cy=2, dates=[datetime.datetime(1, 1, 1, 0, 0, fold=1), datetime.datetime(1, 1,
2, 0, 0)])]

df.toJSON().collect()
['{"cx":1,"cy":2,"dates":["0001-01-02T23:50:36.000-06:00","0001-01-03T23:50:36.000-06:00"]}']

{code}
 

 


> Python date/datetime objects in dataframes increment by 1 day when converted to JSON
> ------------------------------------------------------------------------------------
>
>                 Key: SPARK-25467
>                 URL: https://issues.apache.org/jira/browse/SPARK-25467
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark, SQL
>    Affects Versions: 2.3.1
>         Environment: Spark 2.3.1
> Python 3.6.5 | packaged by conda-forge | (default, Apr  6 2018, 13:39:56) 
> [GCC 4.8.2 20140120 (Red Hat 4.8.2-15)]
> openjdk version "1.8.0_181"
> OpenJDK Runtime Environment (build 1.8.0_181-b13)
> OpenJDK 64-Bit Server VM (build 25.181-b13, mixed mode)
> Centos 7 3.10.0-862.11.6.el7.x86_64 #1 SMP Tue Aug 14 21:49:04 UTC 2018 x86_64 x86_64
GNU/Linux
>            Reporter: David V. Hill
>            Priority: Major
>
> When Dataframes contains datetime.date or datetime.datetime instances and toJSON() is
called on the Dataframe, the day is incremented in the JSON date representation.
> {code}
> # Create a Dataframe containing datetime.date instances, convert to JSON and display
> rows = [Row(cx=1, cy=2, dates=[datetime.date.fromordinal(1), datetime.date.fromordinal(2)])]
> df = sqc.createDataFrame(rows)
> df.collect()
> [Row(cx=1, cy=2, dates=[datetime.date(1, 1, 1), datetime.date(1, 1, 2)])]
> df.toJSON().collect()
> ['{"cx":1,"cy":2,"dates":["0001-01-03","0001-01-04"]}']
> # Issue also occurs with datetime.datetime instances
> rows = [Row(cx=1, cy=2, dates=[datetime.datetime.fromordinal(1), datetime.datetime.fromordinal(2)])]
> df = sqc.createDataFrame(rows)
> df.collect()
> [Row(cx=1, cy=2, dates=[datetime.datetime(1, 1, 1, 0, 0, fold=1), datetime.datetime(1,
1, 2, 0, 0)])]
> df.toJSON().collect()
> ['{"cx":1,"cy":2,"dates":["0001-01-02T23:50:36.000-06:00","0001-01-03T23:50:36.000-06:00"]}']
> {code}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message