airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bolke de Bruin <bdbr...@gmail.com>
Subject Re: Simple question about schedule_interval establishing clear interval boundaries.
Date Tue, 21 Feb 2017 21:57:57 GMT
We don’t do that for this macro. You get the full object. Also “execution_date” is available,
which is also not abbreviated.

- Bolke

> On 21 Feb 2017, at 22:53, Gerard Toonstra <gtoonstra@gmail.com> wrote:
> 
> Hi Bolke,
> 
> Yep, that would work. So weekly and monthly processing can then be executed
> quite easily.
> 
> The only issue that remains is then that these are dates, so wouldn't work
> for a datetime and thus e.g. hourly processing?
> 
> I base that on my observation that:
> 
> ds = self.execution_date.isoformat()[:10]
> 
> So in the code, airflow would internally work with a dtm representation of
> execution_date, but for the macro it gets truncated to a date part only of
> 'YYYY-MM-DD' ?
> 
> 
> 
> On Tue, Feb 21, 2017 at 10:44 PM, Bolke de Bruin <bdbruin@gmail.com> wrote:
> 
>> Hi Gerard,
>> 
>> In 1.8 we introduced prev_execution_date and next_execution_date. Is that
>> what you were looking for?
>> 
>> https://github.com/apache/incubator-airflow/blob/
>> 50702d06187035c99e51ea936c756c00332c4a4a/airflow/models.py#L1489 <
>> https://github.com/apache/incubator-airflow/blob/
>> 50702d06187035c99e51ea936c756c00332c4a4a/airflow/models.py#L1489>
>> 
>> Bolke
>> 
>>> On 21 Feb 2017, at 22:41, Gerard Toonstra <gtoonstra@gmail.com> wrote:
>>> 
>>> Hey all,
>>> 
>>> I'm writing up a bit more about best practices for airflow and realize
>> that
>>> there may be one important macro that's missing, but which sounds really
>>> useful. This is a list of the default macro's:
>>> 
>>> https://airflow.incubator.apache.org/code.html#macros
>>> 
>>> The "execution_date" or "ds" is some interval end date, but there's no
>>> clear macro that defines the start date of that interval, except
>>> "yesterday_ds". Obviously this holds when you run a daily schedule, but
>>> breaks apart when you run things on an hourly or weekly interval for
>>> example.
>>> 
>>> There are three issues here:
>>> - What do people usually do to determine the start interval?  Assume a
>>> daily schedule and use ds and yesterday_ds?
>>> - execution_date has no time part and is a pure date, so this implies
>> that
>>> most airflow tasks are daily processing tasks with a clear midnight
>>> boundary. In the case of hourly processing, one would have to rely on the
>>> machine clock and again assume a schedule interval to establish
>> boundaries
>>> in such interval schedules?  (+issues related to clock-syncing and no
>>> guarantees on exact start times).
>>> - And in the other direction, what's a good approach towards non-daily
>>> schedules (weekly/monthly schedules)?
>>> 
>>> Rgds,
>>> 
>>> Gerard
>> 
>> 


Mime
View raw message