spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Franke <jornfra...@gmail.com>
Subject Re: spark sql partitioned by date... read last date
Date Sun, 01 Nov 2015 21:06:35 GMT
Try with max date, in your case it could make more sense to represent the date as int 

Sent from my iPhone

> On 01 Nov 2015, at 21:03, Koert Kuipers <koert@tresata.com> wrote:
> 
> hello all,
> i am trying to get familiar with spark sql partitioning support.
> 
> my data is partitioned by date, so like this:
> data/date=2015-01-01
> data/date=2015-01-02
> data/date=2015-01-03
> ...
> 
> lets say i would like a batch process to read data for the latest date only. how do i
proceed? 
> generally the latest date will be yesterday, but it could be a day older or maybe 2.

> 
> i understand that i will have to do something like:
> df.filter(df("date") === some_date_string_here)
> 
> however i do now know what some_date_string_here should be. i would like to inspect the
available dates and pick the latest. is there an efficient way to  find out what the available
partitions are?
> 
> thanks! koert
> 
> 

Mime
View raw message