spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Koert Kuipers <ko...@tresata.com>
Subject Re: spark sql partitioned by date... read last date
Date Sun, 01 Nov 2015 21:09:08 GMT
good idea. with the dates sorting correctly alphabetically i should be able
to do something similar with strings

On Sun, Nov 1, 2015 at 4:06 PM, Jörn Franke <jornfranke@gmail.com> wrote:

> Try with max date, in your case it could make more sense to represent the
> date as int
>
> Sent from my iPhone
>
> On 01 Nov 2015, at 21:03, Koert Kuipers <koert@tresata.com> wrote:
>
> hello all,
> i am trying to get familiar with spark sql partitioning support.
>
> my data is partitioned by date, so like this:
> data/date=2015-01-01
> data/date=2015-01-02
> data/date=2015-01-03
> ...
>
> lets say i would like a batch process to read data for the latest date
> only. how do i proceed?
> generally the latest date will be yesterday, but it could be a day older
> or maybe 2.
>
> i understand that i will have to do something like:
> df.filter(df("date") === some_date_string_here)
>
> however i do now know what some_date_string_here should be. i would like
> to inspect the available dates and pick the latest. is there an efficient
> way to  find out what the available partitions are?
>
> thanks! koert
>
>
>

Mime
View raw message