spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <>
Subject Re: how do you deal with datetime in Spark?
Date Tue, 03 Oct 2017 19:18:37 GMT

On 3 Oct 2017, at 18:43, Adaryl Wakefield <<>>

I gave myself a project to start actually writing Spark programs. I’m using Scala and Spark
2.2.0. In my project, I had to do some grouping and filtering by dates. It was awful and took
forever. I was trying to use dataframes and SQL as much as possible. I see that there are
date functions in the dataframe API but trying to use them was frustrating. Even following
code samples was a headache because apparently the code is different depending on which version
of Spark you are using. I was really hoping for a rich set of date functions like you’d
find in T-SQL but I never really found them.

Is there a best practice for dealing with dates and time in Spark? I feel like taking a date/time
string and converting it to a date/time object and then manipulating data based on the various
components of the timestamp object (hour, day, year etc.) should be a heck of a lot easier
than what I’m finding and perhaps I’m just not looking in the right place.

You can see my work here:

Once you've done that one, I have a few hundred MB of london bike stats if you wan then. Their
timestamps come in as strings, but "01/01/1970" is by far the most popular dropoff time, which
is 0 in the epoch...

9809600,0,6248,01/01/1970 00:00,0,NA,31/01/2012 19:31,365,City Road: Angel
9806201,0,6422,01/01/1970 00:00,0,NA,31/01/2012 19:32,17,Hatton Wall: Holborn
9802063,0,4096,01/01/1970 00:00,0,NA,31/01/2012 19:34,338,Wellington Street : Strand
9804765,0,5276,01/01/1970 00:00,0,NA,31/01/2012 19:37,93,Cloudesley Road: Angel
9806779,1970,14,31/01/2012 20:11,410,Edgware Road Station: Paddington
9813333,0,5810,01/01/1970 00:00,0,NA,31/01/2012 19:39,114,Park Road (Baker Street): Regent's
9803952,0,5682,01/01/1970 00:00,0,NA,31/01/2012 19:41,210,Hinde Street: Marylebone
9818659,0,5572,01/01/1970 00:00,0,NA,31/01/2012 19:41,87,Devonshire Square: Liverpool Street
9808144,0,5244,01/01/1970 00:00,0,NA,31/01/2012 19:42,374,Waterloo Station 1: Waterloo
9814365,0,5422,01/01/1970 00:00,0,NA,31/01/2012 19:48,15,Great Russell Street: Bloomsbury
9816863,0,6079,01/01/1970 00:00,0,NA,31/01/2012 19:49,258,Kensington Gore: Knightsbridge
9818469,0,4903,01/01/1970 00:00,0,NA,31/01/2012 19:50,341,Craven Street: Strand
9811512,0,5572,01/01/1970 00:00,0,NA,31/01/2012 19:50,298,Curlew Street: Shad Thames
9817931,0,708,01/01/1970 00:00,0,NA,31/01/2012 19:51,341,Craven Street: Strand
9816429,0,3210,01/01/1970 00:00,0,NA,31/01/2012 19:59,388,Southampton Street: Strand
9806284,0,4359,01/01/1970 00:00,0,NA,31/01/2012 20:06,335,Tavistock Street: Covent Garden

Adaryl "Bob" Wakefield, MBA
Mass Street Analytics, LLC
Twitter: @BobLovesData<>

View raw message