spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: how to covert millisecond time to SQL timeStamp
Date Tue, 02 Feb 2016 04:14:28 GMT
See related thread on using Joda DateTime:
http://search-hadoop.com/m/q3RTtSfi342nveex1&subj=RE+NPE+
when+using+Joda+DateTime

On Mon, Feb 1, 2016 at 7:44 PM, Kevin Mellott <kevin.r.mellott@gmail.com>
wrote:

> I've had pretty good success using Joda-Time
> <http://www.joda.org/joda-time/index.html> for date/time manipulations
> within Spark applications. You may be able to use the *DateTIme* constructor
> below, if you are starting with milliseconds.
>
> DateTime
>
> public DateTime(long instant)
>
> Constructs an instance set to the milliseconds from 1970-01-01T00:00:00Z
> using ISOChronology in the default time zone.
> Parameters:instant - the milliseconds from 1970-01-01T00:00:00Z
>
> On Mon, Feb 1, 2016 at 5:51 PM, Andy Davidson <
> Andy@santacruzintegration.com> wrote:
>
>> What little I know about working with timestamps is based on
>> https://databricks.com/blog/2015/09/16/spark-1-5-dataframe-api-highlights-datetimestring-handling-time-intervals-and-udafs.html
>>
>> Using the example of dates formatted into human friend strings ->
>> timeStamps I was able to figure out how to convert Epoch times to
>> timestamps. The same trick did not work for millisecond times.
>>
>> Any suggestions would be greatly appreciated.
>>
>>
>> Andy
>>
>> Working with epoch times
>> <http://localhost:8888/notebooks/sparkTimeSeriesExperiments.ipynb#Working-with-epock-times>
>>
>> ref: http://www.epochconverter.com/
>>
>> Epoch timestamp: 1456050620
>>
>> Timestamp in milliseconds: 1456050620000
>>
>> Human time (GMT): Sun, 21 Feb 2016 10:30:20 GMT
>>
>> Human time (your time zone): 2/21/2016, 2:30:20 AM
>>
>>
>> # Epoch time stamp example
>>
>> data = [
>>
>>   ("1456050620", "1456050621", 1),
>>
>>   ("1456050622", "14560506203", 2),
>>
>>   ("14560506204", "14560506205", 3)]
>>
>> df = sqlContext.createDataFrame(data, ["start_time", "end_time", "id"])
>>
>> ​
>>
>> # convert epoch time strings in to spark timestamps
>>
>> df = df.select(
>>
>>   df.start_time.cast("long").alias("start_time"),
>>
>>   df.end_time.cast("long").alias("end_time"),
>>
>>   df.id)
>>
>> df.printSchema()
>>
>> df.show(truncate=False)
>>
>> ​
>>
>> # convert longs to timestamps
>>
>> df = df.select(
>>
>>   df.start_time.cast("timestamp").alias("start_time"),
>>
>>   df.end_time.cast("timestamp").alias("end_time"),
>>
>>   df.id)
>>
>> df.printSchema()
>>
>> df.show(truncate=False)
>>
>> ​
>>
>> root
>>  |-- start_time: long (nullable = true)
>>  |-- end_time: long (nullable = true)
>>  |-- id: long (nullable = true)
>>
>> +-----------+-----------+---+
>> |start_time |end_time   |id |
>> +-----------+-----------+---+
>> |1456050620 |1456050621 |1  |
>> |1456050622 |14560506203|2  |
>> |14560506204|14560506205|3  |
>> +-----------+-----------+---+
>>
>> root
>>  |-- start_time: timestamp (nullable = true)
>>  |-- end_time: timestamp (nullable = true)
>>  |-- id: long (nullable = true)
>>
>> +---------------------+---------------------+---+
>> |start_time           |end_time             |id |
>> +---------------------+---------------------+---+
>> |2016-02-21 02:30:20.0|2016-02-21 02:30:21.0|1  |
>> |2016-02-21 02:30:22.0|2431-05-28 02:03:23.0|2  |
>> |2431-05-28 02:03:24.0|2431-05-28 02:03:25.0|3  |
>> +---------------------+---------------------+---+
>>
>>
>> In [21]:
>>
>> # working with millisecond times
>>
>> data = [
>>
>>   ("1456050620000", "1456050620000", 1)]
>>
>>   df = sqlContext.createDataFrame(data, ["start_time", "end_time", "id"])
>>
>> ​
>>
>> # convert epoch time strings in to spark timestamps
>>
>> df = df.select(
>>
>>   df.start_time.cast("long").alias("start_time"),
>>
>>   df.end_time.cast("long").alias("end_time"),
>>
>>   df.id)
>>
>> df.printSchema()
>>
>> df.show(truncate=False)
>>
>> ​
>>
>> # convert longs to timestamps
>>
>> df = df.select(
>>
>>   df.start_time.cast("timestamp").alias("start_time"),
>>
>>   df.end_time.cast("timestamp").alias("end_time"),
>>
>>   df.id)
>>
>> df.printSchema()
>>
>> df.show(truncate=False)
>>
>> root
>>  |-- start_time: long (nullable = true)
>>  |-- end_time: long (nullable = true)
>>  |-- id: long (nullable = true)
>>
>> +-------------+-------------+---+
>> |start_time   |end_time     |id |
>> +-------------+-------------+---+
>> |1456050620000|1456050620000|1  |
>> +-------------+-------------+---+
>>
>> root
>>  |-- start_time: timestamp (nullable = true)
>>  |-- end_time: timestamp (nullable = true)
>>  |-- id: long (nullable = true)
>>
>> +----------------------+----------------------+---+
>> |start_time            |end_time              |id |
>> +----------------------+----------------------+---+
>> |48110-05-29 10:33:20.0|48110-05-29 10:33:20.0|1  |
>> +----------------------+----------------------+---+
>>
>>
>>
>

Mime
View raw message