airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robin Edwards <>
Subject dynamic dag generation
Date Mon, 03 Sep 2018 15:58:57 GMT

I have a DAG where the input size (rows) may grow or shrink significantly.

The first step (A) determines the size of the input set and groups into
batches of a pre-defined size.

The second step I want to generate a task per batch to perform an upload to
a third party API (google adwords) / computation.

The final step is a sensor which waits for the status of the batch to be
completed and then a final task.

Thoughts so far:

- I don't necessarily need all tasks to execute in parallel I just want to
be able to control the number that do through Pools
- I could potentially calculate the batch size and number of tasks required
at DAG compile time but this would make my DAG loading very slow (as I will
have lots of DAGs doing this)
- Is changing the number of tasks in a DAG dynamically going to screw up
- I found this but it feels a bit of a
- I could trigger multiple dagruns but this makes it harder to visualise
and trace through the UI

Or am i approaching this problem in the wrong way?

Thanks for your help,


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message