airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gerard Casas Saez <gcasass...@twitter.com.INVALID>
Subject Re: [AIP-34] Rewrite SubDagOperator
Date Tue, 18 Aug 2020 15:55:42 GMT
Is it not possible to solve this at the UI level? Aka tell dagre to only
add 1 edge to the group instead of to all nodes in the group? No need to do
SubDag behaviour, but just reduce the edges on the graph. Should reduce
load time if I understand correctly.

I would strongly avoid the Dummy operator since it will introduce delays on
operator execution (as it will need to execute 1 dummy operator and that
can be expensive imo).

Overall though proposal looks good, unless anyone opposes it, I would move
this to vote mode :D

Gerard Casas Saez
Twitter | Cortex | @casassaez <http://twitter.com/casassaez>


On Mon, Aug 17, 2020 at 9:56 AM Yu Qian <yuqian1990@gmail.com> wrote:

> Hi, All,
> Here's the updated AIP-34
> <
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-34+TaskGroup%3A+A+UI+task+grouping+concept+as+an+alternative+to+SubDagOperator
> >.
> The PR has been fine-tuned with better UI interactions and added
> serialization of TaskGroup: https://github.com/apache/airflow/pull/10153
>
> Here's some experiment results:
> A made up dag containing 403 tasks, and 5696 edges. Grouped like this. Note
> there's a inside_section_2 is intentionally made to depend on all tasks
> in inside_section_1 to generate a large number of edges. The observation is
> that opening the top level graph is very quick, around 270ms. Expanding
> groups that don't have a lot of dense dependencies on other groups are also
> hardly noticeable. E.g expanding section_1 takes 330ms. The part that takes
> time is when expanding both groups inside_section_1 and inside_section_2
> Because there are 2500 edges between these two inner groups, it took 63
> seconds to expand both of them. Majority of the time (more than 62seconds)
> is actually taken by the layout() function in dagre. In other words, it's
> very fast to add nodes and edges, but laying them out on the graph takes
> time. This issue is not actually a problem specific to TaskGroup. Without
> TaskGroup, if a DAG contains too many edges, it takes time to layout the
> graph too.
>
> On the other hand, a more realistic experiment with production DAG
> containing about 400 tasks and 700 edges showed that grouping tasks into
> three levels of nested TaskGroup cut the upfront page opening time from
> around 6s to 500ms. (Obviously the time is paid back when user gradually
> expands all the groups one by one, but normally people don't need to expand
> every group every time so it's still a big saving). The experiments are
> done on OS X Mojave, 2.2 GHz, Intel Core i7, 16GB Memory, Chrome.
>
> I can see a few possible improvements to TaskGroup (or how it's used) that
> can be done as a next-step:
> 1). Like Gerard suggested, we can implement lazy-loading. Instead of
> displaying the whole DAG, we can limit the Graph View to show only a single
> TaskGroup, omitting its edges going out to other TaskGroups. This behaviour
> is more like SubDagOperator where users can zoom into/out of a TaskGroup
> and look at only tasks within that TaskGroup as if those are the only tasks
> on the DAG. This can be done with either background javascript calls or by
> making a new get request with filtering parameters. Obviously the downside
> is that it's not as explicit as showing all the dependencies on the graph.
> 2). Users can improve the organization of the DAG themselves to reduce the
> number of edges. E.g. if every task in group2 depends on every tasks in
> group1, instead of doing group1 >> group2, they can add a DummyOperator in
> between and do this: group1 >> dummy >> group2. This cuts down the number
> of edges significantly and page load becomes much faster.
> 3). If we really want, we can improve the >> operator of TaskGroup to do 2)
> automatically. If it sees that both sides of >> are TaskGroup, it can
> create a DummyOperator on behalf of the user. The downside is that it may
> be too much magic.
>
> Thanks,
> Qian
>
> def create_section():
> """
> Create tasks in the outer section.
> """
> dummies = [DummyOperator(task_id=f'task-{i + 1}') for i in range(100)]
>
> with TaskGroup("inside_section_1") as inside_section_1:
> _ = [DummyOperator(task_id=f'task-{i + 1}',) for i in range(50)]
>
> with TaskGroup("inside_section_2") as inside_section_2:
> _ = [DummyOperator(task_id=f'task-{i + 1}',) for i in range(50)]
>
> dummies[-1] >> inside_section_1
> dummies[-2] >> inside_section_2
> inside_section_1 >> inside_section_2
>
>
> with DAG(dag_id="example_task_group", start_date=days_ago(2)) as dag:
> start = DummyOperator(task_id="start")
>
> with TaskGroup("section_1") as section_1:
> create_section()
>
> some_other_task = DummyOperator(task_id="some-other-task")
>
> with TaskGroup("section_2") as section_2:
> create_section()
>
> end = DummyOperator(task_id='end')
>
> start >> section_1 >> some_other_task >> section_2 >> end
>
>
> On Sat, Aug 15, 2020 at 6:56 AM Gerard Casas Saez
> <gcasassaez@twitter.com.invalid> wrote:
>
> > Re graph times. That makes sense. Let me know what you find. We may be
> able
> > to contribute on the lazy loading part.
> >
> > Looking forward to see the updated AIP!
> >
> >
> > Gerard Casas Saez
> > Twitter | Cortex | @casassaez <http://twitter.com/casassaez>
> >
> >
> > On Fri, Aug 14, 2020 at 6:14 AM Kaxil Naik <kaxilnaik@gmail.com> wrote:
> >
> > > Permissions granted, let me know if you face any issues.
> > >
> > > On Fri, Aug 14, 2020 at 1:10 PM Yu Qian <yuqian1990@gmail.com> wrote:
> > >
> > > > Hi, Kaxil, my ID for cwiki.apache.org is yuqian1990. Thank you!
> > > >
> > > > On Fri, Aug 14, 2020 at 7:35 PM Kaxil Naik <kaxilnaik@gmail.com>
> > wrote:
> > > >
> > > > > What's your ID i.e. if you haven't created an account yet, please
> > > create
> > > > > one at https://cwiki.apache.org/confluence/signup.action and send
> us
> > > > your
> > > > > ID and we will add permissions.
> > > > >
> > > > > Thanks. I'll edit the AIP. May I request permission to edit it?
> > > > > > My wiki user email is yuqian1990@gmail.com.
> > > > >
> > > > >
> > > > > On Fri, Aug 14, 2020 at 9:45 AM Yu Qian <yuqian1990@gmail.com>
> > wrote:
> > > > >
> > > > > > Re, Xinbin. Thanks. I'll edit the AIP. May I request permission
> to
> > > edit
> > > > > it?
> > > > > > My wiki user email is yuqian1990@gmail.com.
> > > > > >
> > > > > > Re Gerard: yes the UI loads all the nodes as json from the web
> > server
> > > > at
> > > > > > once. However, it only adds the top level nodes and edges to the
> > > graph
> > > > > when
> > > > > > the Graph View page is first opened. And then adds the expanded
> > nodes
> > > > to
> > > > > > the graph as the user expands them. From what I've experienced
> with
> > > > DAGs
> > > > > > containing around 400 tasks (not using TaskGroup or
> > SubDagOperator),
> > > > > > opening the whole dag in Graph View usually takes 5 seconds. Less
> > > than
> > > > > 60ms
> > > > > > of that is taken by loading the data from webserver. The
> remaining
> > > > 4.9s+
> > > > > is
> > > > > > taken by javascript functions in dagre-d3.min.js such as
> > createNodes,
> > > > > > createEdgeLabels, etc and by rendering the graph. With TaskGroup
> > > being
> > > > > used
> > > > > > to group tasks into a smaller number of top-level nodes, the
> amount
> > > of
> > > > > data
> > > > > > loaded from webserver will remain about the same compared to a
> flat
> > > dag
> > > > > of
> > > > > > the same size, but the number of nodes and edges needed to be
> plot
> > on
> > > > the
> > > > > > graph can be reduced significantly. So in theory this should
> speed
> > up
> > > > the
> > > > > > time it takes to open Graph View even without lazy-loading the
> data
> > > > (I'll
> > > > > > experiment to find out). That said, if it comes to a point
> > > lazy-loading
> > > > > > helps, we can still implement it as an improvement.
> > > > > >
> > > > > > Re James: the Tree View looks as if all all the groups are fully
> > > > > expanded.
> > > > > > (because under the hood all the tasks are in a single DAG). I'm
> > less
> > > > > > worried about Tree View at the moment because it already has a
> > > > mechanism
> > > > > > for collapsing tasks by the dependency tree. That said, the Tree
> > View
> > > > can
> > > > > > definitely be improved too with TaskGroup. (e.g. collapse tasks
> in
> > > the
> > > > > same
> > > > > > TaskGroup when Tree View is first opened).
> > > > > >
> > > > > > For both suggestions, implementing them don't require fundamental
> > > > changes
> > > > > > to the idea. I think we can have a basic working TaskGroup first,
> > and
> > > > > then
> > > > > > improve it incrementally in several PRs as we get more feedback
> > from
> > > > the
> > > > > > community. What do you think?
> > > > > >
> > > > > > Qian
> > > > > >
> > > > > >
> > > > > > On Wed, Aug 12, 2020 at 9:15 AM James Coder <jcoder01@gmail.com>
> > > > wrote:
> > > > > >
> > > > > > > I agree this looks great, one question, how does the tree view
> > > look?
> > > > > > >
> > > > > > > James Coder
> > > > > > >
> > > > > > > > On Aug 11, 2020, at 6:48 PM, Gerard Casas Saez <
> > > > > gcasassaez@twitter.com
> > > > > > .invalid>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > First of all, this is awesome!!
> > > > > > > >
> > > > > > > > Secondly, checking your UI code, seems you are loading all
> > > > operators
> > > > > at
> > > > > > > > once. Wondering if we can load them as needed (aka load
> > whenever
> > > we
> > > > > > click
> > > > > > > > the TaskGroup). Some of our DAGs are so large that take
> forever
> > > to
> > > > > load
> > > > > > > on
> > > > > > > > the Graph view, so worried about this still being an issue
> > here.
> > > It
> > > > > may
> > > > > > > be
> > > > > > > > easily solvable by implementing lazy loading of the graph.
> Not
> > > sure
> > > > > how
> > > > > > > > easy to implement/add to the UI extension (and dont want to
> > push
> > > > for
> > > > > > > early
> > > > > > > > optimization as its the root of all evil).
> > > > > > > > Gerard Casas Saez
> > > > > > > > Twitter | Cortex | @casassaez <http://twitter.com/casassaez>
> > > > > > > >
> > > > > > > >
> > > > > > > >> On Tue, Aug 11, 2020 at 10:35 AM Xinbin Huang <
> > > > > bin.huangxb@gmail.com>
> > > > > > > wrote:
> > > > > > > >>
> > > > > > > >> Hi Yu,
> > > > > > > >>
> > > > > > > >> Thank you so much for taking on this. I was fairly
> distracted
> > > > > > previously
> > > > > > > >> and I didn't have the time to update the proposal. In fact,
> > > after
> > > > > > > >> discussing with Ash, Kaxil and Daniel, the direction of this
> > AIP
> > > > has
> > > > > > > been
> > > > > > > >> changed to favor the concept of TaskGroup instead of
> rewriting
> > > > > > > >> SubDagOperator (though it may may sense to deprecate SubDag
> > in a
> > > > > > future
> > > > > > > >> date.).
> > > > > > > >>
> > > > > > > >> Your PR is amazing and it has implemented the desire
> > features. I
> > > > > think
> > > > > > > we
> > > > > > > >> can focus on your new PR instead. Do you mind updating the
> AIP
> > > > based
> > > > > > on
> > > > > > > >> what you have done in your PR?
> > > > > > > >>
> > > > > > > >> Best,
> > > > > > > >> Bin
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>> On Tue, Aug 11, 2020 at 7:11 AM Yu Qian <
> > yuqian1990@gmail.com>
> > > > > > wrote:
> > > > > > > >>>
> > > > > > > >>> Hi, all, I've added the basic UI changes to my proposed
> > > > > > implementation
> > > > > > > of
> > > > > > > >>> TaskGroup as UI grouping concept:
> > > > > > > >>> https://github.com/apache/airflow/pull/10153
> > > > > > > >>>
> > > > > > > >>> I think Chris had a pretty good specification of TaskGroup
> so
> > > i'm
> > > > > > > quoting
> > > > > > > >>> it here. The only thing I don't fully agree with is the
> > > > restriction
> > > > > > > >>> "... **cannot*
> > > > > > > >>> have dependencies between a Task in a TaskGroup and either
> a*
> > > > > > > >>> *   Task in a different TaskGroup or a Task not in any
> > > group*". I
> > > > > > think
> > > > > > > >>> this is over restrictive. Since TaskGroup is a UI concept,
> > > tasks
> > > > > can
> > > > > > > have
> > > > > > > >>> dependencies on tasks in other TaskGroup or not in any
> > > TaskGroup.
> > > > > In
> > > > > > my
> > > > > > > >> PR,
> > > > > > > >>> this is allowed. The graph edges will update accordingly
> when
> > > > > > > TaskGroups
> > > > > > > >>> are expanded/collapsed. TaskGroup is only helping to make
> the
> > > UI
> > > > > look
> > > > > > > >> less
> > > > > > > >>> crowded. Under the hood, everything is still a DAG of tasks
> > and
> > > > > edges
> > > > > > > so
> > > > > > > >>> things work normally. Here's a screenshot
> > > > > > > >>> <
> > > > > > > >>>
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://raw.githubusercontent.com/yuqian90/airflow/gif_for_demo/airflow/www/static/screen-shot-short.gif
> > > > > > > >>>>
> > > > > > > >>> of the UI interaction.
> > > > > > > >>>
> > > > > > > >>>
> > > > > > > >>>
> > > > > > > >>>
> > > > > > > >>>
> > > > > > > >>>
> > > > > > > >>>
> > > > > > > >>>
> > > > > > > >>>
> > > > > > > >>>
> > > > > > > >>>
> > > > > > > >>> *   - Tasks can be added to a TaskGroup   - You *can* have
> > > > > > dependencies
> > > > > > > >>> between Tasks in the same TaskGroup, but   *cannot* have
> > > > > dependencies
> > > > > > > >>> between a Task in a TaskGroup and either a   Task in a
> > > different
> > > > > > > >> TaskGroup
> > > > > > > >>> or a Task not in any group   - You *can* have dependencies
> > > > between
> > > > > a
> > > > > > > >>> TaskGroup and either other   TaskGroups or Tasks not in any
> > > group
> > > > >  -
> > > > > > > The
> > > > > > > >>> UI will by default render a TaskGroup as a single "object",
> > but
> > > > > >  which
> > > > > > > >> you
> > > > > > > >>> expand or zoom into in some way   - You'd need some way to
> > > > > determine
> > > > > > > what
> > > > > > > >>> the "status" of a TaskGroup was   at least for UI display
> > > > purposes*
> > > > > > > >>>
> > > > > > > >>>
> > > > > > > >>> Regarding Jake's comment, I agree it's possible to
> implement
> > > the
> > > > > > > >> "retrying
> > > > > > > >>> tasks in a group" pattern he mentioned as an optional
> feature
> > > of
> > > > > > > >> TaskGroup
> > > > > > > >>> although that may go against having TaskGroup as a pure UI
> > > > concept.
> > > > > > For
> > > > > > > >> the
> > > > > > > >>> motivating example Jake provided, I suggest implementing
> both
> > > > > > > >>> SubmitLongRunningJobTask and PollJobStatusSensor in a
> single
> > > > > > operator.
> > > > > > > It
> > > > > > > >>> can do something like BaseSensorOperator.execute() does in
> > > > > > "reschedule"
> > > > > > > >>> mode, i.e. it first executes some code to submit the long
> > > running
> > > > > job
> > > > > > > to
> > > > > > > >>> the external service, and store the state (e.g. in XCom).
> > Then
> > > > > > > reschedule
> > > > > > > >>> itself. Subsequent runs then pokes for the completion
> state.
> > > > > > > >>>
> > > > > > > >>>
> > > > > > > >>> On Thu, Aug 6, 2020 at 2:08 AM Jacob Ferriero
> > > > > > > >> <jferriero@google.com.invalid
> > > > > > > >>>>
> > > > > > > >>> wrote:
> > > > > > > >>>
> > > > > > > >>>> I really like this idea of a TaskGroup container as I
> think
> > > this
> > > > > > will
> > > > > > > >> be
> > > > > > > >>>> much easier to use than SubDag.
> > > > > > > >>>>
> > > > > > > >>>> I'd like to propose an optional behavior for special retry
> > > > > mechanics
> > > > > > > >> via
> > > > > > > >>> a
> > > > > > > >>>> TaskGroup.retry_all property.
> > > > > > > >>>> This way I could use TaskGroup to replace my favorite use
> of
> > > > > SubDag
> > > > > > > for
> > > > > > > >>>> atomically retrying tasks of the pattern "act on external
> > > state
> > > > > then
> > > > > > > >>>> reschedule poll until desired state reached".
> > > > > > > >>>>
> > > > > > > >>>> Motivating use case I have for a SubDag is very simple two
> > > task
> > > > > > group
> > > > > > > >>>> [SubmitLongRunningJobTask >> PollJobStatusSensor].
> > > > > > > >>>> I use SubDag is because it gives me an easy way to retry
> the
> > > > > > > >>> SubmitJobTask
> > > > > > > >>>> if something about the PollJobSensor fails.
> > > > > > > >>>> This pattern would be really nice for jobs that are
> expected
> > > to
> > > > > run
> > > > > > a
> > > > > > > >>> long
> > > > > > > >>>> time (because we can use sensor can use reschedule mode
> > > freeing
> > > > up
> > > > > > > >> slots)
> > > > > > > >>>> but might fail for a retryable reason.
> > > > > > > >>>> However, using SubDag to meet this use case defeats the
> > > purpose
> > > > > > > because
> > > > > > > >>>> SubDag infamously
> > > > > > > >>>> <
> > > > > > > >>>>
> > > > > > > >>>
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://medium.com/@team_24989/fixing-subdagoperator-deadlock-in-airflow-6c64312ebb10
> > > > > > > >>>>>
> > > > > > > >>>> blocks a "controller" slot for the entire duration.
> > > > > > > >>>> This may feel like a cyclic behavior but reality it is
> very
> > > > common
> > > > > > for
> > > > > > > >> a
> > > > > > > >>>> single operator to submit job / wait til done.
> > > > > > > >>>> We could use this case refactor many operators (e.g. BQ,
> > > > Dataproc,
> > > > > > > >>>> Dataflow) to be implemented as TaskGroup[SubmitTask >>
> > > PollTask]
> > > > > > with
> > > > > > > >> an
> > > > > > > >>>> optional reschedule mode if user knows that this job may
> > take
> > > a
> > > > > long
> > > > > > > >>> time.
> > > > > > > >>>>
> > > > > > > >>>> I'd be happy to the development work on adding this
> specific
> > > > retry
> > > > > > > >>> behavior
> > > > > > > >>>> to TaskGroup once the base concept is implemented if
> others
> > in
> > > > the
> > > > > > > >>>> community would find this a useful feature.
> > > > > > > >>>>
> > > > > > > >>>> Cheers,
> > > > > > > >>>> Jake
> > > > > > > >>>>
> > > > > > > >>>> On Tue, Aug 4, 2020 at 10:07 AM Jarek Potiuk <
> > > > > > > Jarek.Potiuk@polidea.com
> > > > > > > >>>
> > > > > > > >>>> wrote:
> > > > > > > >>>>
> > > > > > > >>>>> All for it :) . I think we are getting closer to have
> > regular
> > > > > > > >> planning
> > > > > > > >>>> and
> > > > > > > >>>>> making some structured approach to 2.0 and starting task
> > > force
> > > > > for
> > > > > > it
> > > > > > > >>>> soon,
> > > > > > > >>>>> so I think this should be perfectly fine to discuss and
> > even
> > > > > start
> > > > > > > >>>>> implementing what's beyond as soon as we make sure that
> we
> > > are
> > > > > > > >>>> prioritizing
> > > > > > > >>>>> 2.0 work.
> > > > > > > >>>>>
> > > > > > > >>>>> J,
> > > > > > > >>>>>
> > > > > > > >>>>>
> > > > > > > >>>>> On Tue, Aug 4, 2020 at 12:09 PM Yu Qian <
> > > yuqian1990@gmail.com>
> > > > > > > >> wrote:
> > > > > > > >>>>>
> > > > > > > >>>>>> Hi Jarek,
> > > > > > > >>>>>>
> > > > > > > >>>>>> I agree we should not change the behaviour of the
> existing
> > > > > > > >>>> SubDagOperator
> > > > > > > >>>>>> till Airflow 2.1. Is it okay to continue the discussion
> > > about
> > > > > > > >>> TaskGroup
> > > > > > > >>>>> as
> > > > > > > >>>>>> a brand new concept/feature independent from the
> existing
> > > > > > > >>>> SubDagOperator?
> > > > > > > >>>>>> In other words, shall we add TaskGroup as a UI grouping
> > > > concept
> > > > > > > >> like
> > > > > > > >>>> Ash
> > > > > > > >>>>>> suggested, and not touch SubDagOperator atl all.
> Whenever
> > we
> > > > are
> > > > > > > >>> ready
> > > > > > > >>>>> with
> > > > > > > >>>>>> TaskGroup, we then deprecate SubDagOperator in Airflow
> > 2.1.
> > > > > > > >>>>>>
> > > > > > > >>>>>> I really like Ash's idea of simplifying the
> SubDagOperator
> > > > idea
> > > > > > > >> into
> > > > > > > >>> a
> > > > > > > >>>>>> simple UI grouping concept. I think Xinbin's idea of
> > > > > "reattaching
> > > > > > > >> all
> > > > > > > >>>> the
> > > > > > > >>>>>> tasks to the root DAG" is the way to go. And I see James
> > > > pointed
> > > > > > > >> out
> > > > > > > >>> we
> > > > > > > >>>>>> need some helper functions to simplify dependencies
> > setting
> > > of
> > > > > > > >>>> TaskGroup.
> > > > > > > >>>>>> Xinbin put up a pretty elegant example in his PR
> > > > > > > >>>>>> <https://github.com/apache/airflow/pull/9243>. I think
> > > having
> > > > > > > >>>> TaskGroup
> > > > > > > >>>>> as
> > > > > > > >>>>>> a UI concept should be a relatively small change. We can
> > > > > simplify
> > > > > > > >>>>> Xinbin's
> > > > > > > >>>>>> PR further. So I put up this alternative proposal here:
> > > > > > > >>>>>> https://github.com/apache/airflow/pull/10153
> > > > > > > >>>>>>
> > > > > > > >>>>>> I have not done any UI changes due to lack of experience
> > > with
> > > > > web
> > > > > > > >> UI.
> > > > > > > >>>> If
> > > > > > > >>>>>> anyone's interested, please take a look at the PR.
> > > > > > > >>>>>>
> > > > > > > >>>>>> Qian
> > > > > > > >>>>>>
> > > > > > > >>>>>> On Mon, Jun 22, 2020 at 5:15 AM Jarek Potiuk <
> > > > > > > >>> Jarek.Potiuk@polidea.com
> > > > > > > >>>>>
> > > > > > > >>>>>> wrote:
> > > > > > > >>>>>>
> > > > > > > >>>>>>> Similar point here to the other ideas that are popping
> > up.
> > > > > Maybe
> > > > > > > >> we
> > > > > > > >>>>>> should
> > > > > > > >>>>>>> just focus on completing 2.0 and make all discussions
> > about
> > > > > > > >> further
> > > > > > > >>>>>>> improvements to 2.1? While those are important
> > discussions
> > > > (and
> > > > > > > >> we
> > > > > > > >>>>> should
> > > > > > > >>>>>>> continue them in the  near future !) I think at this
> > point
> > > > > > > >> focusing
> > > > > > > >>>> on
> > > > > > > >>>>>>> delivering 2.0 in its current shape should be our focus
> > > now ?
> > > > > > > >>>>>>>
> > > > > > > >>>>>>> J.
> > > > > > > >>>>>>>
> > > > > > > >>>>>>> On Thu, Jun 18, 2020 at 6:35 PM Xinbin Huang <
> > > > > > > >>> bin.huangxb@gmail.com>
> > > > > > > >>>>>>> wrote:
> > > > > > > >>>>>>>
> > > > > > > >>>>>>>> Hi Daniel
> > > > > > > >>>>>>>>
> > > > > > > >>>>>>>> I agree that the TaskGroup should have the same API
> as a
> > > DAG
> > > > > > > >>> object
> > > > > > > >>>>>>> related
> > > > > > > >>>>>>>> to task dependencies, but it will not have anything
> > > related
> > > > to
> > > > > > > >>>> actual
> > > > > > > >>>>>>>> execution or scheduling.
> > > > > > > >>>>>>>> I will update the AIP according to this over the
> > weekend.
> > > > > > > >>>>>>>>
> > > > > > > >>>>>>>>> We could even make a “DAGTemplate” object s.t. when
> you
> > > > > > > >> import
> > > > > > > >>>> the
> > > > > > > >>>>>>> object
> > > > > > > >>>>>>>> you can import it with parameters to determine the
> shape
> > > of
> > > > > the
> > > > > > > >>>> DAG.
> > > > > > > >>>>>>>>
> > > > > > > >>>>>>>> Can you elaborate a bit more on this? Does it serve a
> > > > similar
> > > > > > > >>>> purpose
> > > > > > > >>>>>> as
> > > > > > > >>>>>>> a
> > > > > > > >>>>>>>> DAG factory function?
> > > > > > > >>>>>>>>
> > > > > > > >>>>>>>>
> > > > > > > >>>>>>>>
> > > > > > > >>>>>>>> On Thu, Jun 18, 2020 at 9:13 AM Daniel Imberman <
> > > > > > > >>>>>>> daniel.imberman@gmail.com
> > > > > > > >>>>>>>>>
> > > > > > > >>>>>>>> wrote:
> > > > > > > >>>>>>>>
> > > > > > > >>>>>>>>> Hi Bin,
> > > > > > > >>>>>>>>>
> > > > > > > >>>>>>>>> Why not give the TaskGroup the same API as a DAG
> object
> > > > (e.g.
> > > > > > > >>> the
> > > > > > > >>>>>>> bitwise
> > > > > > > >>>>>>>>> operator fro task dependencies). We could even make a
> > > > > > > >>>> “DAGTemplate”
> > > > > > > >>>>>>>> object
> > > > > > > >>>>>>>>> s.t. when you import the object you can import it
> with
> > > > > > > >>> parameters
> > > > > > > >>>>> to
> > > > > > > >>>>>>>>> determine the shape of the DAG.
> > > > > > > >>>>>>>>>
> > > > > > > >>>>>>>>>
> > > > > > > >>>>>>>>> On Wed, Jun 17, 2020 at 8:54 PM, Xinbin Huang <
> > > > > > > >>>>> bin.huangxb@gmail.com
> > > > > > > >>>>>>>
> > > > > > > >>>>>>>>> wrote:
> > > > > > > >>>>>>>>> The TaskGroup will not take schedule interval as a
> > > > parameter
> > > > > > > >>>>> itself,
> > > > > > > >>>>>>> and
> > > > > > > >>>>>>>> it
> > > > > > > >>>>>>>>> depends on the DAG where it attaches to. In my
> opinion,
> > > the
> > > > > > > >>>>> TaskGroup
> > > > > > > >>>>>>>> will
> > > > > > > >>>>>>>>> only contain a group of tasks with interdependencies,
> > and
> > > > the
> > > > > > > >>>>>> TaskGroup
> > > > > > > >>>>>>>>> behaves like a task. It doesn't contain any
> > > > > > > >>> execution/scheduling
> > > > > > > >>>>>> logic
> > > > > > > >>>>>>>>> (i.e. schedule_interval, concurrency, max_active_runs
> > > etc.)
> > > > > > > >>> like
> > > > > > > >>>> a
> > > > > > > >>>>>> DAG
> > > > > > > >>>>>>>>> does.
> > > > > > > >>>>>>>>>
> > > > > > > >>>>>>>>>> For example, there is the scenario that the schedule
> > > > > > > >> interval
> > > > > > > >>>> of
> > > > > > > >>>>>> DAG
> > > > > > > >>>>>>> is
> > > > > > > >>>>>>>>> 1 hour and the schedule interval of TaskGroup is 20
> > min.
> > > > > > > >>>>>>>>>
> > > > > > > >>>>>>>>> I am curious why you ask this. Is this a use case
> that
> > > you
> > > > > > > >> want
> > > > > > > >>>> to
> > > > > > > >>>>>>>> achieve?
> > > > > > > >>>>>>>>>
> > > > > > > >>>>>>>>> Bin
> > > > > > > >>>>>>>>>
> > > > > > > >>>>>>>>> On Wed, Jun 17, 2020 at 7:59 PM 蒋晓峰 <
> > > > > > > >> thanosxnicholas@gmail.com
> > > > > > > >>>>
> > > > > > > >>>>>> wrote:
> > > > > > > >>>>>>>>>
> > > > > > > >>>>>>>>>> Hi Bin,
> > > > > > > >>>>>>>>>> Using TaskGroup, Is the schedule interval of
> TaskGroup
> > > the
> > > > > > > >>> same
> > > > > > > >>>>> as
> > > > > > > >>>>>>> the
> > > > > > > >>>>>>>>>> parent DAG? My main concern is whether the schedule
> > > > > > > >> interval
> > > > > > > >>> of
> > > > > > > >>>>>>>> TaskGroup
> > > > > > > >>>>>>>>>> could be different with that of the DAG? For
> example,
> > > > there
> > > > > > > >>> is
> > > > > > > >>>>> the
> > > > > > > >>>>>>>>> scenario
> > > > > > > >>>>>>>>>> that the schedule interval of DAG is 1 hour and the
> > > > > > > >> schedule
> > > > > > > >>>>>> interval
> > > > > > > >>>>>>>> of
> > > > > > > >>>>>>>>>> TaskGroup is 20 min.
> > > > > > > >>>>>>>>>>
> > > > > > > >>>>>>>>>> Cheers,
> > > > > > > >>>>>>>>>> Nicholas
> > > > > > > >>>>>>>>>>
> > > > > > > >>>>>>>>>> On Thu, Jun 18, 2020 at 10:30 AM Xinbin Huang <
> > > > > > > >>>>>> bin.huangxb@gmail.com
> > > > > > > >>>>>>>>
> > > > > > > >>>>>>>>>> wrote:
> > > > > > > >>>>>>>>>>
> > > > > > > >>>>>>>>>>> Hi Nicholas,
> > > > > > > >>>>>>>>>>>
> > > > > > > >>>>>>>>>>> I am not sure about the old behavior of
> > SubDagOperator,
> > > > > > > >>> maybe
> > > > > > > >>>>> it
> > > > > > > >>>>>>> will
> > > > > > > >>>>>>>>>> throw
> > > > > > > >>>>>>>>>>> an error? But in the original proposal, the
> subdag's
> > > > > > > >>>>>>>> schedule_interval
> > > > > > > >>>>>>>>>> will
> > > > > > > >>>>>>>>>>> be ignored. Or if we decide to use TaskGroup to
> > replace
> > > > > > > >>>> SubDag,
> > > > > > > >>>>>>> there
> > > > > > > >>>>>>>>>> will
> > > > > > > >>>>>>>>>>> be no subdag schedule_interval.
> > > > > > > >>>>>>>>>>>
> > > > > > > >>>>>>>>>>> Bin
> > > > > > > >>>>>>>>>>>
> > > > > > > >>>>>>>>>>> On Wed, Jun 17, 2020 at 6:21 PM 蒋晓峰 <
> > > > > > > >>>> thanosxnicholas@gmail.com
> > > > > > > >>>>>>
> > > > > > > >>>>>>>> wrote:
> > > > > > > >>>>>>>>>>>
> > > > > > > >>>>>>>>>>>> Hi Bin,
> > > > > > > >>>>>>>>>>>> Thanks for your good proposal. I was confused
> > whether
> > > > > > > >> the
> > > > > > > >>>>>>> schedule
> > > > > > > >>>>>>>>>>>> interval of SubDAG is different from that of the
> > > parent
> > > > > > > >>>> DAG?
> > > > > > > >>>>> I
> > > > > > > >>>>>>> have
> > > > > > > >>>>>>>>>>>> discussed with Jiajie Zhong about the schedule
> > > interval
> > > > > > > >>> of
> > > > > > > >>>>>>> SubDAG.
> > > > > > > >>>>>>>> If
> > > > > > > >>>>>>>>>> the
> > > > > > > >>>>>>>>>>>> SubDagOperator has a different schedule interval,
> > what
> > > > > > > >>> will
> > > > > > > >>>>>>> happen
> > > > > > > >>>>>>>>> for
> > > > > > > >>>>>>>>>>> the
> > > > > > > >>>>>>>>>>>> scheduler to schedule the parent DAG?
> > > > > > > >>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>> Regards,
> > > > > > > >>>>>>>>>>>> Nicholas Jiang
> > > > > > > >>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>> On Thu, Jun 18, 2020 at 8:04 AM Xinbin Huang <
> > > > > > > >>>>>>>> bin.huangxb@gmail.com>
> > > > > > > >>>>>>>>>>>> wrote:
> > > > > > > >>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>> Thank you, Max, Kaxil, and everyone's feedback!
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>> I have rethought about the concept of subdag and
> > task
> > > > > > > >>>>>> groups. I
> > > > > > > >>>>>>>>> think
> > > > > > > >>>>>>>>>>> the
> > > > > > > >>>>>>>>>>>>> better way to approach this is to entirely remove
> > > > > > > >>> subdag
> > > > > > > >>>>> and
> > > > > > > >>>>>>>>>> introduce
> > > > > > > >>>>>>>>>>>> the
> > > > > > > >>>>>>>>>>>>> concept of TaskGroup, which is a container of
> tasks
> > > > > > > >>> along
> > > > > > > >>>>>> with
> > > > > > > >>>>>>>>> their
> > > > > > > >>>>>>>>>>>>> dependencies *without execution/scheduling logic
> > as a
> > > > > > > >>>> DAG*.
> > > > > > > >>>>>> The
> > > > > > > >>>>>>>>> only
> > > > > > > >>>>>>>>>>>>> purpose of it is to group a list of tasks, but
> you
> > > > > > > >>> still
> > > > > > > >>>>> need
> > > > > > > >>>>>>> to
> > > > > > > >>>>>>>>> add
> > > > > > > >>>>>>>>>> it
> > > > > > > >>>>>>>>>>>> to
> > > > > > > >>>>>>>>>>>>> a DAG for execution.
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>> Here is a small code snippet.
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>> ```
> > > > > > > >>>>>>>>>>>>> class TaskGroup:
> > > > > > > >>>>>>>>>>>>> """
> > > > > > > >>>>>>>>>>>>> A TaskGroup contains a group of tasks.
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>> If default_args is missing, it will take default
> > args
> > > > > > > >>>> from
> > > > > > > >>>>>> the
> > > > > > > >>>>>>>>>> DAG.
> > > > > > > >>>>>>>>>>>>> """
> > > > > > > >>>>>>>>>>>>> def __init__(self, group_id, default_args):
> > > > > > > >>>>>>>>>>>>> pass
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>> """
> > > > > > > >>>>>>>>>>>>> You can add tasks to a task group similar to
> adding
> > > > > > > >>> tasks
> > > > > > > >>>>> to
> > > > > > > >>>>>> a
> > > > > > > >>>>>>>> DAG
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>> This can be declared in a separate file from the
> > dag
> > > > > > > >>> file
> > > > > > > >>>>>>>>>>>>> """
> > > > > > > >>>>>>>>>>>>> download_group = TaskGroup(group_id='download',
> > > > > > > >>>>>>>>>>>> default_args=default_args)
> > > > > > > >>>>>>>>>>>>> download_group.add_task(task1)
> > > > > > > >>>>>>>>>>>>> task2.dag = download_group
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>> with download_group:
> > > > > > > >>>>>>>>>>>>> task3 = DummyOperator(task_id='task3')
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>> [task, task2] >> task3
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>> """Add it to a DAG for execution"""
> > > > > > > >>>>>>>>>>>>> with DAG(dag_id='start_download_dag',
> > > > > > > >>>>>>> default_args=default_args,
> > > > > > > >>>>>>>>>>>>> schedule_interval='@daily', ...) as dag:
> > > > > > > >>>>>>>>>>>>> start = DummyOperator(task_id='start')
> > > > > > > >>>>>>>>>>>>> start >> download_group
> > > > > > > >>>>>>>>>>>>> # this is equivalent to
> > > > > > > >>>>>>>>>>>>> # start >> [task, task2] >> task3
> > > > > > > >>>>>>>>>>>>> ```
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>> With this, we can still reuse a group of tasks
> and
> > > > > > > >> set
> > > > > > > >>>>>>>> dependencies
> > > > > > > >>>>>>>>>>>> between
> > > > > > > >>>>>>>>>>>>> them; it avoids the boilerplate code from using
> > > > > > > >>>>>> SubDagOperator,
> > > > > > > >>>>>>>> and
> > > > > > > >>>>>>>>>> we
> > > > > > > >>>>>>>>>>>> can
> > > > > > > >>>>>>>>>>>>> declare dependencies as `task >> task_group >>
> > task`.
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>> User migration wise, we can introduce it before
> > > > > > > >> Airflow
> > > > > > > >>>> 2.0
> > > > > > > >>>>>> and
> > > > > > > >>>>>>>>> allow
> > > > > > > >>>>>>>>>>>>> gradual transition. Then we can decide if we
> still
> > > > > > > >> want
> > > > > > > >>>> to
> > > > > > > >>>>>> keep
> > > > > > > >>>>>>>> the
> > > > > > > >>>>>>>>>>>>> SubDagOperator or simply remove it.
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>> Any thoughts?
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>> Cheers,
> > > > > > > >>>>>>>>>>>>> Bin
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>> On Wed, Jun 17, 2020 at 7:37 AM Maxime
> Beauchemin <
> > > > > > > >>>>>>>>>>>>> maximebeauchemin@gmail.com> wrote:
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>> +1, proposal looks good.
> > > > > > > >>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>> The original intention was really to have tasks
> > > > > > > >>> groups
> > > > > > > >>>>> and
> > > > > > > >>>>>> a
> > > > > > > >>>>>>>>>>>> zoom-in/out
> > > > > > > >>>>>>>>>>>>> in
> > > > > > > >>>>>>>>>>>>>> the UI. The original reasoning was to reuse the
> > DAG
> > > > > > > >>>>> object
> > > > > > > >>>>>>>> since
> > > > > > > >>>>>>>>> it
> > > > > > > >>>>>>>>>>> is
> > > > > > > >>>>>>>>>>>> a
> > > > > > > >>>>>>>>>>>>>> group of tasks, but as highlighted here it does
> > > > > > > >>> create
> > > > > > > >>>>>>>> underlying
> > > > > > > >>>>>>>>>>>>>> confusions since a DAG is much more than just a
> > > > > > > >> group
> > > > > > > >>>> of
> > > > > > > >>>>>>> tasks.
> > > > > > > >>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>> Max
> > > > > > > >>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>> On Mon, Jun 15, 2020 at 2:43 AM Poornima Joshi <
> > > > > > > >>>>>>>>>>>>> joshipoornima06@gmail.com>
> > > > > > > >>>>>>>>>>>>>> wrote:
> > > > > > > >>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>> Thank you for your email.
> > > > > > > >>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>> On Sat, Jun 13, 2020 at 12:18 AM Xinbin Huang <
> > > > > > > >>>>>>>>>>> bin.huangxb@gmail.com
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>> wrote:
> > > > > > > >>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>> - *Unpack SubDags during dag parsing*: This
> > > > > > > >>>>>> rewrites
> > > > > > > >>>>>>>> the
> > > > > > > >>>>>>>>>>>>>>>> *DagBag.bag_dag*
> > > > > > > >>>>>>>>>>>>>>>>>> method to unpack subdag while parsing, and
> > > > > > > >> it
> > > > > > > >>>>> will
> > > > > > > >>>>>>>> give a
> > > > > > > >>>>>>>>>>>> flat
> > > > > > > >>>>>>>>>>>>>>>>>> structure at
> > > > > > > >>>>>>>>>>>>>>>>>> the task level
> > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>> The serialized_dag representation already
> > > > > > > >> does
> > > > > > > >>>>> this I
> > > > > > > >>>>>>>>> think.
> > > > > > > >>>>>>>>>> At
> > > > > > > >>>>>>>>>>>>> least
> > > > > > > >>>>>>>>>>>>>>> if
> > > > > > > >>>>>>>>>>>>>>>>> I've understood your idea here correctly.
> > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>> I am not sure about serialized_dag
> > > > > > > >>> representation,
> > > > > > > >>>>> but
> > > > > > > >>>>>> at
> > > > > > > >>>>>>>>> least
> > > > > > > >>>>>>>>>>> it
> > > > > > > >>>>>>>>>>>>> will
> > > > > > > >>>>>>>>>>>>>>>> still keep the subdag entry in the DAG table?
> > > > > > > >> In
> > > > > > > >>> my
> > > > > > > >>>>>>>> proposal
> > > > > > > >>>>>>>>> as
> > > > > > > >>>>>>>>>>>> also
> > > > > > > >>>>>>>>>>>>> in
> > > > > > > >>>>>>>>>>>>>>> the
> > > > > > > >>>>>>>>>>>>>>>> draft PR, the idea is to *extract the tasks
> > > > > > > >> from
> > > > > > > >>>> the
> > > > > > > >>>>>>> subdag
> > > > > > > >>>>>>>>> and
> > > > > > > >>>>>>>>>>> add
> > > > > > > >>>>>>>>>>>>>> them
> > > > > > > >>>>>>>>>>>>>>>> back to the root_dag. *So the runtime DAG
> graph
> > > > > > > >>>> will
> > > > > > > >>>>>> look
> > > > > > > >>>>>>>>>> exactly
> > > > > > > >>>>>>>>>>>> the
> > > > > > > >>>>>>>>>>>>>>>> same as without subdag but with metadata
> > > > > > > >> attached
> > > > > > > >>>> to
> > > > > > > >>>>>>> those
> > > > > > > >>>>>>>>>>>> sections.
> > > > > > > >>>>>>>>>>>>>>> These
> > > > > > > >>>>>>>>>>>>>>>> metadata will be later on used to render in
> the
> > > > > > > >>> UI.
> > > > > > > >>>>> So
> > > > > > > >>>>>>>> after
> > > > > > > >>>>>>>>>>>> parsing
> > > > > > > >>>>>>>>>>>>> (
> > > > > > > >>>>>>>>>>>>>>>> *DagBag.process_file()*), it will just output
> > > > > > > >> the
> > > > > > > >>>>>>> *root_dag
> > > > > > > >>>>>>>>>>>> *instead
> > > > > > > >>>>>>>>>>>>> of
> > > > > > > >>>>>>>>>>>>>>> *root_dag +
> > > > > > > >>>>>>>>>>>>>>>> subdag + subdag + nested subdag* etc.
> > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>> - e.g. section-1-* will have metadata
> > > > > > > >>>>>>>>>> current_group=section-1,
> > > > > > > >>>>>>>>>>>>>>>> parent_group=<the-root-dag-id> (welcome for
> > > > > > > >>> naming
> > > > > > > >>>>>>>>>>> suggestions),
> > > > > > > >>>>>>>>>>>>> the
> > > > > > > >>>>>>>>>>>>>>>> reason for parent_group is that we can have
> > > > > > > >>> nested
> > > > > > > >>>>>> group
> > > > > > > >>>>>>>> and
> > > > > > > >>>>>>>>>>>> still
> > > > > > > >>>>>>>>>>>>>> be
> > > > > > > >>>>>>>>>>>>>>>> able to capture the dependency.
> > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>> Runtime DAG:
> > > > > > > >>>>>>>>>>>>>>>> [image: image.png]
> > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>> While at the UI, what we see would be
> something
> > > > > > > >>>> like
> > > > > > > >>>>>> this
> > > > > > > >>>>>>>> by
> > > > > > > >>>>>>>>>>>>> utilizing
> > > > > > > >>>>>>>>>>>>>>> the
> > > > > > > >>>>>>>>>>>>>>>> metadata, and then we can expand or zoom into
> > > > > > > >> in
> > > > > > > >>>> some
> > > > > > > >>>>>>> way.
> > > > > > > >>>>>>>>>>>>>>>> [image: image.png]
> > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>> The benefits I can see is that:
> > > > > > > >>>>>>>>>>>>>>>> 1. We don't need to deal with the extra
> > > > > > > >>> complexity
> > > > > > > >>>> of
> > > > > > > >>>>>>>> SubDag
> > > > > > > >>>>>>>>>> for
> > > > > > > >>>>>>>>>>>>>>> execution
> > > > > > > >>>>>>>>>>>>>>>> and scheduling. It will be the same as not
> > > > > > > >> using
> > > > > > > >>>>>> SubDag.
> > > > > > > >>>>>>>>>>>>>>>> 2. Still have the benefits of modularized and
> > > > > > > >>>>> reusable
> > > > > > > >>>>>>> dag
> > > > > > > >>>>>>>>> code
> > > > > > > >>>>>>>>>>> and
> > > > > > > >>>>>>>>>>>>>>>> declare dependencies between them. And with
> the
> > > > > > > >>> new
> > > > > > > >>>>>>>>>>> SubDagOperator
> > > > > > > >>>>>>>>>>>>> (see
> > > > > > > >>>>>>>>>>>>>>> AIP
> > > > > > > >>>>>>>>>>>>>>>> or draft PR), we can use the same dag_factory
> > > > > > > >>>>> function
> > > > > > > >>>>>>> for
> > > > > > > >>>>>>>>>>>>> generating 1
> > > > > > > >>>>>>>>>>>>>>>> dag, a lot of dynamic dags, or used for SubDag
> > > > > > > >>> (in
> > > > > > > >>>>> this
> > > > > > > >>>>>>>> case,
> > > > > > > >>>>>>>>>> it
> > > > > > > >>>>>>>>>>>> will
> > > > > > > >>>>>>>>>>>>>>> just
> > > > > > > >>>>>>>>>>>>>>>> extract all underlying tasks and append to the
> > > > > > > >>> root
> > > > > > > >>>>>> dag).
> > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>> - Then it gets to the idea of replacing subdag
> > > > > > > >>>> with a
> > > > > > > >>>>>>>>>> simpler
> > > > > > > >>>>>>>>>>>>>> concept
> > > > > > > >>>>>>>>>>>>>>>> by Ash: the proposed change basically drains
> > > > > > > >> out
> > > > > > > >>>> the
> > > > > > > >>>>>>>>>> contents
> > > > > > > >>>>>>>>>>>> of
> > > > > > > >>>>>>>>>>>>> a
> > > > > > > >>>>>>>>>>>>>>> SubDag
> > > > > > > >>>>>>>>>>>>>>>> and becomes more like
> > > > > > > >>>>>>>>>>>> ExtractSubdagTasksAndAppendToRootdagOperator
> > > > > > > >>>>>>>>>>>>>>> (forgive
> > > > > > > >>>>>>>>>>>>>>>> me about the crazy name..). In this case, it
> is
> > > > > > > >>>> still
> > > > > > > >>>>>>>>>>> necessary
> > > > > > > >>>>>>>>>>>> to
> > > > > > > >>>>>>>>>>>>>>> keep the
> > > > > > > >>>>>>>>>>>>>>>> concept of subdag as it is nothing more than a
> > > > > > > >>>> name?
> > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>> That's why the TaskGroup idea comes up. Thanks
> > > > > > > >>>> Chris
> > > > > > > >>>>>>> Palmer
> > > > > > > >>>>>>>>> for
> > > > > > > >>>>>>>>>>>>> helping
> > > > > > > >>>>>>>>>>>>>>>> conceptualize the functionality of TaskGroup,
> I
> > > > > > > >>>> will
> > > > > > > >>>>>> just
> > > > > > > >>>>>>>>> paste
> > > > > > > >>>>>>>>>>> it
> > > > > > > >>>>>>>>>>>>>> here.
> > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>> - Tasks can be added to a TaskGroup
> > > > > > > >>>>>>>>>>>>>>>>> - You *can* have dependencies between Tasks
> > > > > > > >> in
> > > > > > > >>>> the
> > > > > > > >>>>>> same
> > > > > > > >>>>>>>>>>>> TaskGroup,
> > > > > > > >>>>>>>>>>>>>> but
> > > > > > > >>>>>>>>>>>>>>>>> *cannot* have dependencies between a Task in
> > > > > > > >> a
> > > > > > > >>>>>>> TaskGroup
> > > > > > > >>>>>>>>>> and
> > > > > > > >>>>>>>>>>>>>> either a
> > > > > > > >>>>>>>>>>>>>>>>> Task in a different TaskGroup or a Task not
> > > > > > > >> in
> > > > > > > >>>> any
> > > > > > > >>>>>>> group
> > > > > > > >>>>>>>>>>>>>>>>> - You *can* have dependencies between a
> > > > > > > >>> TaskGroup
> > > > > > > >>>>> and
> > > > > > > >>>>>>>>>> either
> > > > > > > >>>>>>>>>>>>> other
> > > > > > > >>>>>>>>>>>>>>>>> TaskGroups or Tasks not in any group
> > > > > > > >>>>>>>>>>>>>>>>> - The UI will by default render a TaskGroup
> > > > > > > >> as
> > > > > > > >>> a
> > > > > > > >>>>>> single
> > > > > > > >>>>>>>>>>>> "object",
> > > > > > > >>>>>>>>>>>>>> but
> > > > > > > >>>>>>>>>>>>>>>>> which you expand or zoom into in some way
> > > > > > > >>>>>>>>>>>>>>>>> - You'd need some way to determine what the
> > > > > > > >>>>> "status"
> > > > > > > >>>>>>> of a
> > > > > > > >>>>>>>>>>>>> TaskGroup
> > > > > > > >>>>>>>>>>>>>>> was
> > > > > > > >>>>>>>>>>>>>>>>> at least for UI display purposes
> > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>> I agree with Chris:
> > > > > > > >>>>>>>>>>>>>>>> - From the backend's view (scheduler &
> > > > > > > >>> executor), I
> > > > > > > >>>>>> think
> > > > > > > >>>>>>>>>>> TaskGroup
> > > > > > > >>>>>>>>>>>>>>> should
> > > > > > > >>>>>>>>>>>>>>>> be ignored during execution. (unless we decide
> > > > > > > >> to
> > > > > > > >>>>>>> implement
> > > > > > > >>>>>>>>>> some
> > > > > > > >>>>>>>>>>>>>> metadata
> > > > > > > >>>>>>>>>>>>>>>> operations that allows start/stop a group of
> > > > > > > >>> tasks
> > > > > > > >>>>>> etc.)
> > > > > > > >>>>>>>>>>>>>>>> - From the UI's View, it should be able to
> pick
> > > > > > > >>> up
> > > > > > > >>>>> the
> > > > > > > >>>>>>>>>> individual
> > > > > > > >>>>>>>>>>>>>> tasks'
> > > > > > > >>>>>>>>>>>>>>>> status and then determine the TaskGroup's
> > > > > > > >> status
> > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>> Bin
> > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 10:28 AM Daniel
> > > > > > > >> Imberman
> > > > > > > >>> <
> > > > > > > >>>>>>>>>>>>>>>> daniel.imberman@gmail.com> wrote:
> > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>> I hadn’t thought about using the `>>`
> operator
> > > > > > > >>> to
> > > > > > > >>>>> tie
> > > > > > > >>>>>>> dags
> > > > > > > >>>>>>>>>>>> together
> > > > > > > >>>>>>>>>>>>>> but
> > > > > > > >>>>>>>>>>>>>>> I
> > > > > > > >>>>>>>>>>>>>>>>> think that sounds pretty great! I wonder if
> we
> > > > > > > >>>> could
> > > > > > > >>>>>>>>>> essentially
> > > > > > > >>>>>>>>>>>>> write
> > > > > > > >>>>>>>>>>>>>>> in
> > > > > > > >>>>>>>>>>>>>>>>> the ability to set dependencies to all
> > > > > > > >>>> starter-tasks
> > > > > > > >>>>>> for
> > > > > > > >>>>>>>>> that
> > > > > > > >>>>>>>>>>> DAG.
> > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>> I’m personally ok with SubDag being a mostly
> > > > > > > >> UI
> > > > > > > >>>>>> concept.
> > > > > > > >>>>>>>> It
> > > > > > > >>>>>>>>>>>> doesn’t
> > > > > > > >>>>>>>>>>>>>> need
> > > > > > > >>>>>>>>>>>>>>>>> to execute separately, you’re just adding
> more
> > > > > > > >>>> tasks
> > > > > > > >>>>>> to
> > > > > > > >>>>>>>> the
> > > > > > > >>>>>>>>>>> queue
> > > > > > > >>>>>>>>>>>>> that
> > > > > > > >>>>>>>>>>>>>>> will
> > > > > > > >>>>>>>>>>>>>>>>> be executed when there are resources
> > > > > > > >> available.
> > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>> via Newton Mail [
> > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>
> > > > > > > >>>>>>>>>>
> > > > > > > >>>>>>>>>
> > > > > > > >>>>>>>>
> > > > > > > >>>>>>>
> > > > > > > >>>>>>
> > > > > > > >>>>>
> > > > > > > >>>>
> > > > > > > >>>
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.14.6&source=email_footer_2
> > > > > > > >>>>>>>>>>>>>>>>> ]
> > > > > > > >>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 9:45 AM, Chris Palmer
> > > > > > > >> <
> > > > > > > >>>>>>>>>>> chris@crpalmer.com
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>> wrote:
> > > > > > > >>>>>>>>>>>>>>>>> I agree that SubDAGs are an overly complex
> > > > > > > >>>>>> abstraction.
> > > > > > > >>>>>>> I
> > > > > > > >>>>>>>>>> think
> > > > > > > >>>>>>>>>>>> what
> > > > > > > >>>>>>>>>>>>>> is
> > > > > > > >>>>>>>>>>>>>>>>> needed/useful is a TaskGroup concept. On a
> > > > > > > >> high
> > > > > > > >>>>> level
> > > > > > > >>>>>> I
> > > > > > > >>>>>>>>> think
> > > > > > > >>>>>>>>>>> you
> > > > > > > >>>>>>>>>>>>> want
> > > > > > > >>>>>>>>>>>>>>>>> this
> > > > > > > >>>>>>>>>>>>>>>>> functionality:
> > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>> - Tasks can be added to a TaskGroup
> > > > > > > >>>>>>>>>>>>>>>>> - You *can* have dependencies between Tasks
> in
> > > > > > > >>> the
> > > > > > > >>>>>> same
> > > > > > > >>>>>>>>>>> TaskGroup,
> > > > > > > >>>>>>>>>>>>> but
> > > > > > > >>>>>>>>>>>>>>>>> *cannot* have dependencies between a Task in
> a
> > > > > > > >>>>>> TaskGroup
> > > > > > > >>>>>>>> and
> > > > > > > >>>>>>>>>>>> either
> > > > > > > >>>>>>>>>>>>> a
> > > > > > > >>>>>>>>>>>>>>>>> Task in a different TaskGroup or a Task not
> in
> > > > > > > >>> any
> > > > > > > >>>>>> group
> > > > > > > >>>>>>>>>>>>>>>>> - You *can* have dependencies between a
> > > > > > > >>> TaskGroup
> > > > > > > >>>>> and
> > > > > > > >>>>>>>> either
> > > > > > > >>>>>>>>>>> other
> > > > > > > >>>>>>>>>>>>>>>>> TaskGroups or Tasks not in any group
> > > > > > > >>>>>>>>>>>>>>>>> - The UI will by default render a TaskGroup
> > > > > > > >> as a
> > > > > > > >>>>>> single
> > > > > > > >>>>>>>>>>> "object",
> > > > > > > >>>>>>>>>>>>> but
> > > > > > > >>>>>>>>>>>>>>>>> which you expand or zoom into in some way
> > > > > > > >>>>>>>>>>>>>>>>> - You'd need some way to determine what the
> > > > > > > >>>> "status"
> > > > > > > >>>>>> of
> > > > > > > >>>>>>> a
> > > > > > > >>>>>>>>>>>> TaskGroup
> > > > > > > >>>>>>>>>>>>>> was
> > > > > > > >>>>>>>>>>>>>>>>> at least for UI display purposes
> > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>> Not sure if it would need to be a top level
> > > > > > > >>> object
> > > > > > > >>>>>> with
> > > > > > > >>>>>>>> its
> > > > > > > >>>>>>>>>> own
> > > > > > > >>>>>>>>>>>>>> database
> > > > > > > >>>>>>>>>>>>>>>>> table and model or just another attribute on
> > > > > > > >>>> tasks.
> > > > > > > >>>>> I
> > > > > > > >>>>>>>> think
> > > > > > > >>>>>>>>>> you
> > > > > > > >>>>>>>>>>>>> could
> > > > > > > >>>>>>>>>>>>>>>>> build
> > > > > > > >>>>>>>>>>>>>>>>> it in a way such that from the schedulers
> > > > > > > >> point
> > > > > > > >>> of
> > > > > > > >>>>>> view
> > > > > > > >>>>>>> a
> > > > > > > >>>>>>>>> DAG
> > > > > > > >>>>>>>>>>> with
> > > > > > > >>>>>>>>>>>>>>>>> TaskGroups doesn't get treated any
> > > > > > > >> differently.
> > > > > > > >>> So
> > > > > > > >>>>> it
> > > > > > > >>>>>>>> really
> > > > > > > >>>>>>>>>>> just
> > > > > > > >>>>>>>>>>>>>>> becomes
> > > > > > > >>>>>>>>>>>>>>>>> a
> > > > > > > >>>>>>>>>>>>>>>>> shortcut for setting dependencies between
> sets
> > > > > > > >>> of
> > > > > > > >>>>>> Tasks,
> > > > > > > >>>>>>>> and
> > > > > > > >>>>>>>>>>>> allows
> > > > > > > >>>>>>>>>>>>>> the
> > > > > > > >>>>>>>>>>>>>>> UI
> > > > > > > >>>>>>>>>>>>>>>>> to simplify the render of the DAG structure.
> > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>> Chris
> > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 12:12 PM Dan Davydov
> > > > > > > >>>>>>>>>>>>>>> <ddavydov@twitter.com.invalid
> > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>> wrote:
> > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>> Agree with James (and think it's actually
> > > > > > > >> the
> > > > > > > >>>> more
> > > > > > > >>>>>>>>> important
> > > > > > > >>>>>>>>>>>> issue
> > > > > > > >>>>>>>>>>>>>> to
> > > > > > > >>>>>>>>>>>>>>>>> fix),
> > > > > > > >>>>>>>>>>>>>>>>>> but I am still convinced Ash' idea is the
> > > > > > > >>> right
> > > > > > > >>>>> way
> > > > > > > >>>>>>>>> forward
> > > > > > > >>>>>>>>>>>> (just
> > > > > > > >>>>>>>>>>>>> it
> > > > > > > >>>>>>>>>>>>>>>>> might
> > > > > > > >>>>>>>>>>>>>>>>>> require a bit more work to deprecate than
> > > > > > > >>> adding
> > > > > > > >>>>>>> visual
> > > > > > > >>>>>>>>>>> grouping
> > > > > > > >>>>>>>>>>>>> in
> > > > > > > >>>>>>>>>>>>>>> the
> > > > > > > >>>>>>>>>>>>>>>>>> UI).
> > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>> There was a previous thread about this FYI
> > > > > > > >>> with
> > > > > > > >>>>> more
> > > > > > > >>>>>>>>> context
> > > > > > > >>>>>>>>>>> on
> > > > > > > >>>>>>>>>>>>> why
> > > > > > > >>>>>>>>>>>>>>>>> subdags
> > > > > > > >>>>>>>>>>>>>>>>>> are bad and potential solutions:
> > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>
> > > > > > > >>>>>>>
> > > > > > > >>
> > > https://www.mail-archive.com/dev@airflow.apache.org/msg01202.html
> > > > > > > >>>>>>>>>>>>>> . A
> > > > > > > >>>>>>>>>>>>>>>>>> solution I outline there to Jame's problem
> > > > > > > >> is
> > > > > > > >>>> e.g.
> > > > > > > >>>>>>>>> enabling
> > > > > > > >>>>>>>>>>> the
> > > > > > > >>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>> operator
> > > > > > > >>>>>>>>>>>>>>>>>> for Airflow operators to work with DAGs as
> > > > > > > >>>> well. I
> > > > > > > >>>>>> see
> > > > > > > >>>>>>>>> this
> > > > > > > >>>>>>>>>>>> being
> > > > > > > >>>>>>>>>>>>>>>>> separate
> > > > > > > >>>>>>>>>>>>>>>>>> from Ash' solution for DAG grouping in the
> > > > > > > >> UI
> > > > > > > >>>> but
> > > > > > > >>>>>> one
> > > > > > > >>>>>>> of
> > > > > > > >>>>>>>>> the
> > > > > > > >>>>>>>>>>> two
> > > > > > > >>>>>>>>>>>>>> items
> > > > > > > >>>>>>>>>>>>>>>>>> required to replace all existing subdag
> > > > > > > >>>>>> functionality.
> > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>> I've been working with subdags for 3 years
> > > > > > > >> and
> > > > > > > >>>>> they
> > > > > > > >>>>>>> are
> > > > > > > >>>>>>>>>>> always a
> > > > > > > >>>>>>>>>>>>>> giant
> > > > > > > >>>>>>>>>>>>>>>>> pain
> > > > > > > >>>>>>>>>>>>>>>>>> to use. They are a constant source of user
> > > > > > > >>>>> confusion
> > > > > > > >>>>>>> and
> > > > > > > >>>>>>>>>>>> breakages
> > > > > > > >>>>>>>>>>>>>>>>> during
> > > > > > > >>>>>>>>>>>>>>>>>> upgrades. Would love to see them gone :).
> > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 11:11 AM James
> > > > > > > >> Coder <
> > > > > > > >>>>>>>>>>>> jcoder01@gmail.com>
> > > > > > > >>>>>>>>>>>>>>>>> wrote:
> > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>> I'm not sure I totally agree it's just a
> > > > > > > >> UI
> > > > > > > >>>>>>> concept. I
> > > > > > > >>>>>>>>> use
> > > > > > > >>>>>>>>>>> the
> > > > > > > >>>>>>>>>>>>>>> subdag
> > > > > > > >>>>>>>>>>>>>>>>>>> operator to simplify dependencies too. If
> > > > > > > >>> you
> > > > > > > >>>>>> have a
> > > > > > > >>>>>>>>> group
> > > > > > > >>>>>>>>>>> of
> > > > > > > >>>>>>>>>>>>>> tasks
> > > > > > > >>>>>>>>>>>>>>>>> that
> > > > > > > >>>>>>>>>>>>>>>>>>> need to finish before another group of
> > > > > > > >> tasks
> > > > > > > >>>>>> start,
> > > > > > > >>>>>>>>> using
> > > > > > > >>>>>>>>>> a
> > > > > > > >>>>>>>>>>>>> subdag
> > > > > > > >>>>>>>>>>>>>>> is
> > > > > > > >>>>>>>>>>>>>>>>> a
> > > > > > > >>>>>>>>>>>>>>>>>>> pretty quick way to set those dependencies
> > > > > > > >>>> and I
> > > > > > > >>>>>>> think
> > > > > > > >>>>>>>>>> also
> > > > > > > >>>>>>>>>>>> make
> > > > > > > >>>>>>>>>>>>>> it
> > > > > > > >>>>>>>>>>>>>>>>>> easier
> > > > > > > >>>>>>>>>>>>>>>>>>> to follow the dag code.
> > > > > > > >>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 9:53 AM Kyle
> > > > > > > >> Hamlin
> > > > > > > >>> <
> > > > > > > >>>>>>>>>>>>> hamlin.kn@gmail.com>
> > > > > > > >>>>>>>>>>>>>>>>> wrote:
> > > > > > > >>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>> I second Ash’s grouping concept.
> > > > > > > >>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 5:10 AM Ash
> > > > > > > >>>>>> Berlin-Taylor
> > > > > > > >>>>>>> <
> > > > > > > >>>>>>>>>>>>>> ash@apache.org
> > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>> wrote:
> > > > > > > >>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>>> Question:
> > > > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>>> Do we even need the SubDagOperator
> > > > > > > >>>> anymore?
> > > > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>>> Would removing it entirely and just
> > > > > > > >>>>> replacing
> > > > > > > >>>>>> it
> > > > > > > >>>>>>>>> with
> > > > > > > >>>>>>>>>> a
> > > > > > > >>>>>>>>>>> UI
> > > > > > > >>>>>>>>>>>>>>>>> grouping
> > > > > > > >>>>>>>>>>>>>>>>>>>>> concept be conceptually simpler, less
> > > > > > > >> to
> > > > > > > >>>> get
> > > > > > > >>>>>>>> wrong,
> > > > > > > >>>>>>>>>> and
> > > > > > > >>>>>>>>>>>>> closer
> > > > > > > >>>>>>>>>>>>>>> to
> > > > > > > >>>>>>>>>>>>>>>>>> what
> > > > > > > >>>>>>>>>>>>>>>>>>>>> users actually want to achieve with
> > > > > > > >>>> subdags?
> > > > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>>> With your proposed change, tasks in
> > > > > > > >>>> subdags
> > > > > > > >>>>>>> could
> > > > > > > >>>>>>>>>> start
> > > > > > > >>>>>>>>>>>>>> running
> > > > > > > >>>>>>>>>>>>>>> in
> > > > > > > >>>>>>>>>>>>>>>>>>>>> parallel (a good change) -- so should
> > > > > > > >> we
> > > > > > > >>>> not
> > > > > > > >>>>>>> also
> > > > > > > >>>>>>>>> just
> > > > > > > >>>>>>>>>>>>>>> _enitrely_
> > > > > > > >>>>>>>>>>>>>>>>>>> remove
> > > > > > > >>>>>>>>>>>>>>>>>>>>> the concept of a sub dag and replace
> > > > > > > >> it
> > > > > > > >>>> with
> > > > > > > >>>>>>>>> something
> > > > > > > >>>>>>>>>>>>>> simpler.
> > > > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>>> Problems with subdags (I think. I
> > > > > > > >>> haven't
> > > > > > > >>>>> used
> > > > > > > >>>>>>>> them
> > > > > > > >>>>>>>>>>>>>> extensively
> > > > > > > >>>>>>>>>>>>>>> so
> > > > > > > >>>>>>>>>>>>>>>>>> may
> > > > > > > >>>>>>>>>>>>>>>>>>>>> be wrong on some of these):
> > > > > > > >>>>>>>>>>>>>>>>>>>>> - They need their own dag_id, but it
> > > > > > > >>>> has(?)
> > > > > > > >>>>> to
> > > > > > > >>>>>>> be
> > > > > > > >>>>>>>> of
> > > > > > > >>>>>>>>>> the
> > > > > > > >>>>>>>>>>>>> form
> > > > > > > >>>>>>>>>>>>>>>>>>>>> `parent_dag_id.subdag_id`.
> > > > > > > >>>>>>>>>>>>>>>>>>>>> - They need their own
> > > > > > > >> schedule_interval,
> > > > > > > >>>> but
> > > > > > > >>>>>> it
> > > > > > > >>>>>>>> has
> > > > > > > >>>>>>>>> to
> > > > > > > >>>>>>>>>>>> match
> > > > > > > >>>>>>>>>>>>>> the
> > > > > > > >>>>>>>>>>>>>>>>>> parent
> > > > > > > >>>>>>>>>>>>>>>>>>>> dag
> > > > > > > >>>>>>>>>>>>>>>>>>>>> - Sub dags can be paused on their own.
> > > > > > > >>>> (Does
> > > > > > > >>>>>> it
> > > > > > > >>>>>>>> make
> > > > > > > >>>>>>>>>>> sense
> > > > > > > >>>>>>>>>>>>> to
> > > > > > > >>>>>>>>>>>>>> do
> > > > > > > >>>>>>>>>>>>>>>>>> this?
> > > > > > > >>>>>>>>>>>>>>>>>>>>> Pausing just a sub dag would mean the
> > > > > > > >>> sub
> > > > > > > >>>>> dag
> > > > > > > >>>>>>>> would
> > > > > > > >>>>>>>>>>> never
> > > > > > > >>>>>>>>>>>>>>>>> execute, so
> > > > > > > >>>>>>>>>>>>>>>>>>>>> the SubDagOperator would fail too.
> > > > > > > >>>>>>>>>>>>>>>>>>>>> - You had to choose the executor to
> > > > > > > >>>>> operator a
> > > > > > > >>>>>>>>> subdag
> > > > > > > >>>>>>>>>>> with
> > > > > > > >>>>>>>>>>>>> --
> > > > > > > >>>>>>>>>>>>>>>>> always
> > > > > > > >>>>>>>>>>>>>>>>>> a
> > > > > > > >>>>>>>>>>>>>>>>>>>>> bit of a kludge.
> > > > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>>> Thoughts?
> > > > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>>> -ash
> > > > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>>> On Jun 12 2020, at 12:01 pm, Ash
> > > > > > > >>>>>> Berlin-Taylor <
> > > > > > > >>>>>>>>>>>>>> ash@apache.org>
> > > > > > > >>>>>>>>>>>>>>>>>> wrote:
> > > > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>>>> Workon sub-dags is much needed, I'm
> > > > > > > >>>>> excited
> > > > > > > >>>>>> to
> > > > > > > >>>>>>>> see
> > > > > > > >>>>>>>>>> how
> > > > > > > >>>>>>>>>>>>> this
> > > > > > > >>>>>>>>>>>>>>>>>>> progresses.
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> - *Unpack SubDags during dag
> > > > > > > >>> parsing*:
> > > > > > > >>>>> This
> > > > > > > >>>>>>>>>> rewrites
> > > > > > > >>>>>>>>>>>> the
> > > > > > > >>>>>>>>>>>>>>>>>>>>> *DagBag.bag_dag*
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> method to unpack subdag while
> > > > > > > >>> parsing,
> > > > > > > >>>>> and
> > > > > > > >>>>>> it
> > > > > > > >>>>>>>>> will
> > > > > > > >>>>>>>>>>>> give a
> > > > > > > >>>>>>>>>>>>>>> flat
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> structure at
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> the task level
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>>>> The serialized_dag representation
> > > > > > > >>>> already
> > > > > > > >>>>>> does
> > > > > > > >>>>>>>>> this
> > > > > > > >>>>>>>>>> I
> > > > > > > >>>>>>>>>>>>> think.
> > > > > > > >>>>>>>>>>>>>>> At
> > > > > > > >>>>>>>>>>>>>>>>>> least
> > > > > > > >>>>>>>>>>>>>>>>>>>> if
> > > > > > > >>>>>>>>>>>>>>>>>>>>>> I've understood your idea here
> > > > > > > >>>> correctly.
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>>>> -ash
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>>>> On Jun 12 2020, at 9:51 am, Xinbin
> > > > > > > >>>> Huang <
> > > > > > > >>>>>>>>>>>>>>> bin.huangxb@gmail.com
> > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>> wrote:
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> Hi everyone,
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> Sending a message to everyone and
> > > > > > > >>>> collect
> > > > > > > >>>>>>>>> feedback
> > > > > > > >>>>>>>>>> on
> > > > > > > >>>>>>>>>>>> the
> > > > > > > >>>>>>>>>>>>>>>>> AIP-34
> > > > > > > >>>>>>>>>>>>>>>>>> on
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> rewriting SubDagOperator. This was
> > > > > > > >>>>>> previously
> > > > > > > >>>>>>>>>> briefly
> > > > > > > >>>>>>>>>>>>>>>>> mentioned in
> > > > > > > >>>>>>>>>>>>>>>>>>> the
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> discussion about what needs to be
> > > > > > > >>> done
> > > > > > > >>>>> for
> > > > > > > >>>>>>>>> Airflow
> > > > > > > >>>>>>>>>>> 2.0,
> > > > > > > >>>>>>>>>>>>> and
> > > > > > > >>>>>>>>>>>>>>>>> one of
> > > > > > > >>>>>>>>>>>>>>>>>>> the
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> ideas is to make SubDagOperator
> > > > > > > >>> attach
> > > > > > > >>>>>> tasks
> > > > > > > >>>>>>>> back
> > > > > > > >>>>>>>>>> to
> > > > > > > >>>>>>>>>>>> the
> > > > > > > >>>>>>>>>>>>>> root
> > > > > > > >>>>>>>>>>>>>>>>> DAG.
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> This AIP-34 focuses on solving
> > > > > > > >>>>>> SubDagOperator
> > > > > > > >>>>>>>>>> related
> > > > > > > >>>>>>>>>>>>>> issues
> > > > > > > >>>>>>>>>>>>>>> by
> > > > > > > >>>>>>>>>>>>>>>>>>>>> reattaching
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> all tasks back to the root dag
> > > > > > > >> while
> > > > > > > >>>>>>> respecting
> > > > > > > >>>>>>>>>>>>>> dependencies
> > > > > > > >>>>>>>>>>>>>>>>>> during
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> parsing. The original grouping
> > > > > > > >> effect
> > > > > > > >>>> on
> > > > > > > >>>>>> the
> > > > > > > >>>>>>> UI
> > > > > > > >>>>>>>>>> will
> > > > > > > >>>>>>>>>>> be
> > > > > > > >>>>>>>>>>>>>>>>> achieved
> > > > > > > >>>>>>>>>>>>>>>>>>>> through
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> grouping related tasks by metadata.
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> This also makes the dag_factory
> > > > > > > >>>> function
> > > > > > > >>>>>> more
> > > > > > > >>>>>>>>>>> reusable
> > > > > > > >>>>>>>>>>>>>>> because
> > > > > > > >>>>>>>>>>>>>>>>> you
> > > > > > > >>>>>>>>>>>>>>>>>>>> don't
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> need to have parent_dag_name and
> > > > > > > >>>>>>> child_dag_name
> > > > > > > >>>>>>>>> in
> > > > > > > >>>>>>>>>>> the
> > > > > > > >>>>>>>>>>>>>>> function
> > > > > > > >>>>>>>>>>>>>>>>>>>>> signature
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> anymore.
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> Changes proposed:
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> - *Unpack SubDags during dag
> > > > > > > >>> parsing*:
> > > > > > > >>>>> This
> > > > > > > >>>>>>>>>> rewrites
> > > > > > > >>>>>>>>>>>> the
> > > > > > > >>>>>>>>>>>>>>>>>>>>> *DagBag.bag_dag*
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> method to unpack subdag while
> > > > > > > >>> parsing,
> > > > > > > >>>>> and
> > > > > > > >>>>>> it
> > > > > > > >>>>>>>>> will
> > > > > > > >>>>>>>>>>>> give a
> > > > > > > >>>>>>>>>>>>>>> flat
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> structure at
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> the task level
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> - *Simplify SubDagOperator*: The
> > > > > > > >> new
> > > > > > > >>>>>>>>> SubDagOperator
> > > > > > > >>>>>>>>>>>> acts
> > > > > > > >>>>>>>>>>>>>>> like a
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> container and most of the original
> > > > > > > >>>>> methods
> > > > > > > >>>>>>> are
> > > > > > > >>>>>>>>>>> removed.
> > > > > > > >>>>>>>>>>>>> The
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> signature is
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> also changed to *subdag_factory
> > > > > > > >> *with
> > > > > > > >>>>>>>>> *subdag_args
> > > > > > > >>>>>>>>>>> *and
> > > > > > > >>>>>>>>>>>>>>>>>>>>> *subdag_kwargs*.
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> This is similar to the
> > > > > > > >> PythonOperator
> > > > > > > >>>>>>>> signature.
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> - *Add a TaskGroup model and add
> > > > > > > >>>>>>> current_group
> > > > > > > >>>>>>>> &
> > > > > > > >>>>>>>>>>>>>> parent_group
> > > > > > > >>>>>>>>>>>>>>>>>>>>> attributes
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> to BaseOperator*: This metadata is
> > > > > > > >>> used
> > > > > > > >>>>> to
> > > > > > > >>>>>>>> group
> > > > > > > >>>>>>>>>>> tasks
> > > > > > > >>>>>>>>>>>>> for
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> rendering at
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> UI level. It may potentially extend
> > > > > > > >>>>> further
> > > > > > > >>>>>>> to
> > > > > > > >>>>>>>>>> group
> > > > > > > >>>>>>>>>>>>>>> arbitrary
> > > > > > > >>>>>>>>>>>>>>>>>>> tasks
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> outside the context of subdag to
> > > > > > > >>> allow
> > > > > > > >>>>>>>>> group-level
> > > > > > > >>>>>>>>>>>>>> operations
> > > > > > > >>>>>>>>>>>>>>>>>>> (i.e.
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> stop/trigger a group of task within
> > > > > > > >>> the
> > > > > > > >>>>>> dag)
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> - *Webserver UI for SubDag*:
> > > > > > > >> Proposed
> > > > > > > >>>> UI
> > > > > > > >>>>>>>>>> modification
> > > > > > > >>>>>>>>>>>> to
> > > > > > > >>>>>>>>>>>>>>> allow
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> (un)collapse a group of tasks for a
> > > > > > > >>>> flat
> > > > > > > >>>>>>>>> structure
> > > > > > > >>>>>>>>>> to
> > > > > > > >>>>>>>>>>>>> pair
> > > > > > > >>>>>>>>>>>>>>> with
> > > > > > > >>>>>>>>>>>>>>>>>>> the
> > > > > > > >>>>>>>>>>>>>>>>>>>>> first
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> change instead of the original
> > > > > > > >>>>> hierarchical
> > > > > > > >>>>>>>>>>> structure.
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> Please see related documents and
> > > > > > > >> PRs
> > > > > > > >>>> for
> > > > > > > >>>>>>>> details:
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> AIP:
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>
> > > > > > > >>>>>>>>>>
> > > > > > > >>>>>>>>>
> > > > > > > >>>>>>>>
> > > > > > > >>>>>>>
> > > > > > > >>>>>>
> > > > > > > >>>>>
> > > > > > > >>>>
> > > > > > > >>>
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-34+Rewrite+SubDagOperator
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> Original Issue:
> > > > > > > >>>>>>>>>>>>>>> https://github.com/apache/airflow/issues/8078
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> Draft PR:
> > > > > > > >>>>>>>>>>> https://github.com/apache/airflow/pull/9243
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> Please let me know if there are any
> > > > > > > >>>>> aspects
> > > > > > > >>>>>>>> that
> > > > > > > >>>>>>>>>> you
> > > > > > > >>>>>>>>>>>>>>>>>> agree/disagree
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> with or
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> need more clarification (especially
> > > > > > > >>> the
> > > > > > > >>>>>> third
> > > > > > > >>>>>>>>>> change
> > > > > > > >>>>>>>>>>>>>>> regarding
> > > > > > > >>>>>>>>>>>>>>>>>>>>> TaskGroup).
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> Any comments are welcome and I am
> > > > > > > >>>> looking
> > > > > > > >>>>>>>> forward
> > > > > > > >>>>>>>>>> to
> > > > > > > >>>>>>>>>>>> it!
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> Cheers
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> Bin
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>> --
> > > > > > > >>>>>>>>>>>>>>>>>>>> Kyle Hamlin
> > > > > > > >>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>> --
> > > > > > > >>>>>>>>>>>>>>> Thanks & Regards
> > > > > > > >>>>>>>>>>>>>>> Poornima
> > > > > > > >>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>
> > > > > > > >>>>>>>>>>
> > > > > > > >>>>>>>>
> > > > > > > >>>>>>>
> > > > > > > >>>>>>>
> > > > > > > >>>>>>> --
> > > > > > > >>>>>>>
> > > > > > > >>>>>>> Jarek Potiuk
> > > > > > > >>>>>>> Polidea <https://www.polidea.com/> | Principal
> Software
> > > > > Engineer
> > > > > > > >>>>>>>
> > > > > > > >>>>>>> M: +48 660 796 129 <+48%20660%20796%20129>
> <+48660796129
> > > > > > > >>>>> <+48%20660%20796%20129>>
> > > > > > > >>>>>>> [image: Polidea] <https://www.polidea.com/>
> > > > > > > >>>>>>>
> > > > > > > >>>>>>
> > > > > > > >>>>>
> > > > > > > >>>>>
> > > > > > > >>>>> --
> > > > > > > >>>>>
> > > > > > > >>>>> Jarek Potiuk
> > > > > > > >>>>> Polidea <https://www.polidea.com/> | Principal Software
> > > > Engineer
> > > > > > > >>>>>
> > > > > > > >>>>> M: +48 660 796 129 <+48%20660%20796%20129> <+48660796129
> > > > > > > >>>>> <+48%20660%20796%20129>>
> > > > > > > >>>>> [image: Polidea] <https://www.polidea.com/>
> > > > > > > >>>>>
> > > > > > > >>>>
> > > > > > > >>>>
> > > > > > > >>>> --
> > > > > > > >>>>
> > > > > > > >>>> *Jacob Ferriero*
> > > > > > > >>>>
> > > > > > > >>>> Strategic Cloud Engineer: Data Engineering
> > > > > > > >>>>
> > > > > > > >>>> jferriero@google.com
> > > > > > > >>>>
> > > > > > > >>>> 617-714-2509
> > > > > > > >>>>
> > > > > > > >>>
> > > > > > > >>
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message