What little I know about working with timestamps is based on https://databricks.com/blog/2015/09/16/spark-1-5-dataframe-api-highlights-datetimestring-handling-time-intervals-and-udafs.html

Using the example of dates formatted into human friend strings -> timeStamps I was able to figure out how to convert Epoch times to timestamps. The same trick did not work for millisecond times. 

Any suggestions would be greatly appreciated.


Andy

Working with epoch times

ref: http://www.epochconverter.com/

Epoch timestamp: 1456050620

Timestamp in milliseconds: 1456050620000

Human time (GMT): Sun, 21 Feb 2016 10:30:20 GMT

Human time (your time zone): 2/21/2016, 2:30:20 AM


# Epoch time stamp example
data = [
  ("1456050620", "1456050621", 1),
  ("1456050622", "14560506203", 2),
  ("14560506204", "14560506205", 3)]
df = sqlContext.createDataFrame(data, ["start_time", "end_time", "id"])
# convert epoch time strings in to spark timestamps
df = df.select(
  df.start_time.cast("long").alias("start_time"),
  df.end_time.cast("long").alias("end_time"),
  df.id)
df.printSchema()
df.show(truncate=False)
# convert longs to timestamps
df = df.select(
  df.start_time.cast("timestamp").alias("start_time"),
  df.end_time.cast("timestamp").alias("end_time"),
  df.id)
df.printSchema()
df.show(truncate=False)
root
 |-- start_time: long (nullable = true)
 |-- end_time: long (nullable = true)
 |-- id: long (nullable = true)

+-----------+-----------+---+
|start_time |end_time   |id |
+-----------+-----------+---+
|1456050620 |1456050621 |1  |
|1456050622 |14560506203|2  |
|14560506204|14560506205|3  |
+-----------+-----------+---+

root
 |-- start_time: timestamp (nullable = true)
 |-- end_time: timestamp (nullable = true)
 |-- id: long (nullable = true)

+---------------------+---------------------+---+
|start_time           |end_time             |id |
+---------------------+---------------------+---+
|2016-02-21 02:30:20.0|2016-02-21 02:30:21.0|1  |
|2016-02-21 02:30:22.0|2431-05-28 02:03:23.0|2  |
|2431-05-28 02:03:24.0|2431-05-28 02:03:25.0|3  |
+---------------------+---------------------+---+

In [21]:
# working with millisecond times
data = [
  ("1456050620000", "1456050620000", 1)]
  
df = sqlContext.createDataFrame(data, ["start_time", "end_time", "id"])
# convert epoch time strings in to spark timestamps
df = df.select(
  df.start_time.cast("long").alias("start_time"),
  df.end_time.cast("long").alias("end_time"),
  df.id)
df.printSchema()
df.show(truncate=False)
# convert longs to timestamps
df = df.select(
  df.start_time.cast("timestamp").alias("start_time"),
  df.end_time.cast("timestamp").alias("end_time"),
  df.id)
df.printSchema()
df.show(truncate=False)
root
 |-- start_time: long (nullable = true)
 |-- end_time: long (nullable = true)
 |-- id: long (nullable = true)

+-------------+-------------+---+
|start_time   |end_time     |id |
+-------------+-------------+---+
|1456050620000|1456050620000|1  |
+-------------+-------------+---+

root
 |-- start_time: timestamp (nullable = true)
 |-- end_time: timestamp (nullable = true)
 |-- id: long (nullable = true)

+----------------------+----------------------+---+
|start_time            |end_time              |id |
+----------------------+----------------------+---+
|48110-05-29 10:33:20.0|48110-05-29 10:33:20.0|1  |
+----------------------+----------------------+---+