spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Koert Kuipers <ko...@tresata.com>
Subject spark sql partitioned by date... read last date
Date Sun, 01 Nov 2015 20:03:35 GMT
hello all,
i am trying to get familiar with spark sql partitioning support.

my data is partitioned by date, so like this:
data/date=2015-01-01
data/date=2015-01-02
data/date=2015-01-03
...

lets say i would like a batch process to read data for the latest date
only. how do i proceed?
generally the latest date will be yesterday, but it could be a day older or
maybe 2.

i understand that i will have to do something like:
df.filter(df("date") === some_date_string_here)

however i do now know what some_date_string_here should be. i would like to
inspect the available dates and pick the latest. is there an efficient way
to  find out what the available partitions are?

thanks! koert

Mime
View raw message