spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Davies Liu <dav...@databricks.com>
Subject Re: unix_timestamp() time zone problem
Date Fri, 18 Mar 2016 21:15:38 GMT
Could you try to cast the timestamp as long?

Internally, timestamp are stored as microseconds in UTC, you will got
seconds in UTC if you cast it to long.

On Thu, Mar 17, 2016 at 1:28 PM, Andy Davidson <
Andy@santacruzintegration.com> wrote:

> I am using python spark 1.6 and the --packages
> datastax:spark-cassandra-connector:1.6.0-M1-s_2.10
>
> I need to convert a time stamp string into a unix epoch time stamp. The
> function unix_timestamp() function assume current time zone. How ever my
> string data is UTC and encodes the time zone as zero. I have not been able
> to find a way to get the unix time calculated correctly. simpleDateFormat
> does not have good time zone support. Any suggestions?
>
> I could write a UDF and to adjust for time zones how ever this seems like
>  a hack
>
> I tried using to_utc_timestamp(created, 'gmt’) how ever this creates a
> timestamp. I have not been able to figure out how to convert this to a unix
> time sample I.e a long representing epoch
>
> Any suggestions?
>
> stmnt = "select \
>             row_key, created, count, unix_timestamp(created) as
> unixTimeStamp, \
>             unix_timestamp(created, 'yyyy-MM-dd HH:mm:ss.z') as etc \
>          from \
>             rawTable \
>          where \
>              (created > '{0}') and (created <= '{1}') \
>              and \
>              (row_key = ‘blue' \
>                         or row_key = ‘red' \
>                     )".format('2016-03-12 00:30:00+0000', '2016-03-12
> 04:30:00+0000’)
>
>
> Sample out put
>
> root
>  |-- row_key: string (nullable = true)
>  |-- created: timestamp (nullable = true)
>  |-- count: long (nullable = true)
>  |-- unixTimeStamp: long (nullable = true)
>  |-- etc: long (nullable = true)
>
> 2016-03-12 00:30:30.0 should be 1457742630 not 1457771430
>
> +-------------+---------------------+-----+-------------+----------+
> |row_key      |created    |count|unixTimeStamp|utc    |
> +-------------+---------------------+-----+-------------+----------+
> |red|2016-03-12 00:30:30.0|2    |1457771430   |1457771430|
> |red|2016-03-12 00:30:45.0|1    |1457771445   |1457771445|
>
>
>
> static Column
> <https://spark.apache.org/docs/1.5.0/api/java/org/apache/spark/sql/Column.html>
*unix_timestamp
> <https://spark.apache.org/docs/1.5.0/api/java/org/apache/spark/sql/functions.html#unix_timestamp(org.apache.spark.sql.Column)>*
> (Column
> <https://spark.apache.org/docs/1.5.0/api/java/org/apache/spark/sql/Column.html>
>  s)
> Converts time string in format yyyy-MM-dd HH:mm:ss to Unix timestamp (in
> seconds), using the default timezone and the default locale, return null if
> fail.
> static Column
> <https://spark.apache.org/docs/1.5.0/api/java/org/apache/spark/sql/Column.html>
*unix_timestamp
> <https://spark.apache.org/docs/1.5.0/api/java/org/apache/spark/sql/functions.html#unix_timestamp(org.apache.spark.sql.Column,%20java.lang.String)>*
> (Column
> <https://spark.apache.org/docs/1.5.0/api/java/org/apache/spark/sql/Column.html>
s,
> java.lang.String p)
> Convert time string with given pattern (see [
> http://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html])
> to Unix time stamp (in seconds), return null if fail.
>

Mime
View raw message