spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adaryl Wakefield <>
Subject RE: how do you deal with datetime in Spark?
Date Tue, 03 Oct 2017 20:14:25 GMT
In my first attempt, I actually tried using case classes and then putting them into a data
set. Scala, I guess doesn’t have a date time data type and I still wound up having to do
some sort of conversion. When I tried to put the data into the dataset because I still had
to define the column as a string. I mean is that right? Is it not possible to create a case
class with a datatype of date or timestamp?

Adaryl "Bob" Wakefield, MBA
Mass Street Analytics, LLC
Twitter: @BobLovesData<>

From: Nicholas Hakobian []
Sent: Tuesday, October 3, 2017 1:04 PM
To: Adaryl Wakefield <>
Subject: Re: how do you deal with datetime in Spark?

I'd suggest first converting your string containing your date/time to a TimestampType or a
DateType. Then the built in functions for year, month, day, etc. will then work as expected.
If your date is in a "standard" format, you can perform the conversion just by casting the
column to a date or timestamp type. The list of types it can auto-convert are listed at this

If casting won't work, you can manually convert it by specifying a format string with the
following builtin function:

The format string uses the java simpleDateFormat format string, if I remember correctly (

Nicholas Szandor Hakobian, Ph.D.
Staff Data Scientist
Rally Health<>

On Tue, Oct 3, 2017 at 10:43 AM, Adaryl Wakefield <<>>
I gave myself a project to start actually writing Spark programs. I’m using Scala and Spark
2.2.0. In my project, I had to do some grouping and filtering by dates. It was awful and took
forever. I was trying to use dataframes and SQL as much as possible. I see that there are
date functions in the dataframe API but trying to use them was frustrating. Even following
code samples was a headache because apparently the code is different depending on which version
of Spark you are using. I was really hoping for a rich set of date functions like you’d
find in T-SQL but I never really found them.

Is there a best practice for dealing with dates and time in Spark? I feel like taking a date/time
string and converting it to a date/time object and then manipulating data based on the various
components of the timestamp object (hour, day, year etc.) should be a heck of a lot easier
than what I’m finding and perhaps I’m just not looking in the right place.

You can see my work here:

Adaryl "Bob" Wakefield, MBA
Mass Street Analytics, LLC
Twitter: @BobLovesData<>

View raw message