spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicholas Hakobian <nicholas.hakob...@rallyhealth.com>
Subject Re: how do you deal with datetime in Spark?
Date Tue, 03 Oct 2017 18:04:00 GMT
I'd suggest first converting your string containing your date/time to a
TimestampType or a DateType. Then the built in functions for year, month,
day, etc. will then work as expected. If your date is in a "standard"
format, you can perform the conversion just by casting the column to a date
or timestamp type. The list of types it can auto-convert are listed at this
link:
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L270-L295

If casting won't work, you can manually convert it by specifying a format
string with the following builtin function:
http://spark.apache.org/docs/2.1.0/api/python/pyspark.sql.html#pyspark.sql.functions.unix_timestamp

The format string uses the java simpleDateFormat format string, if I
remember correctly (
http://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html).

Nicholas Szandor Hakobian, Ph.D.
Staff Data Scientist
Rally Health
nicholas.hakobian@rallyhealth.com


On Tue, Oct 3, 2017 at 10:43 AM, Adaryl Wakefield <
adaryl.wakefield@hotmail.com> wrote:

> I gave myself a project to start actually writing Spark programs. I’m
> using Scala and Spark 2.2.0. In my project, I had to do some grouping and
> filtering by dates. It was awful and took forever. I was trying to use
> dataframes and SQL as much as possible. I see that there are date functions
> in the dataframe API but trying to use them was frustrating. Even following
> code samples was a headache because apparently the code is different
> depending on which version of Spark you are using. I was really hoping for
> a rich set of date functions like you’d find in T-SQL but I never really
> found them.
>
>
>
> Is there a best practice for dealing with dates and time in Spark? I feel
> like taking a date/time string and converting it to a date/time object and
> then manipulating data based on the various components of the timestamp
> object (hour, day, year etc.) should be a heck of a lot easier than what
> I’m finding and perhaps I’m just not looking in the right place.
>
>
>
> You can see my work here: https://github.com/BobLovesData/Apache-Spark-In-
> 24-Hours/blob/master/src/net/massstreet/hour10/BayAreaBikeAnalysis.scala
>
>
>
> Adaryl "Bob" Wakefield, MBA
> Principal
> Mass Street Analytics, LLC
> 913.938.6685 <(913)%20938-6685>
>
> www.massstreet.net
>
> www.linkedin.com/in/bobwakefieldmba
> Twitter: @BobLovesData <http://twitter.com/BobLovesData>
>
>
>
>
>

Mime
View raw message