airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gerard Casas Saez <gcasass...@twitter.com.INVALID>
Subject Re: [AIP-34] Rewrite SubDagOperator
Date Fri, 21 Aug 2020 17:01:34 GMT
Agree on this being non-blocking.

Regarding moving to vote, you can take care. Just open a new email thread
on dev list and call for a vote. You can see this example from Tomek for
AIP-31:
https://lists.apache.org/thread.html/r16ee2b6263f5849a2e2b1b6c3ba39fb3d643022f052216cb19a0a8da%40%3Cdev.airflow.apache.org%3E

Best,


Gerard Casas Saez
Twitter | Cortex | @casassaez <http://twitter.com/casassaez>


On Thu, Aug 20, 2020 at 7:10 PM Yu Qian <yuqian1990@gmail.com> wrote:

> Hi, Gerard, yes I agree it's possible to do this at UI level without any
> fundamental change to the implementation. If expand_group() sees that two
> groups are fully connected (i.e. every task in one parent group depends on
> every task in another parent group), it can decide to collapse all those
> children edges into a single edge between the parent groups to reduce the
> burden of the layout() function. However, I did not find any existing
> algorithm to do this within dagre so we'll likely need to implement this
> ourselves. Another hiccup is that at the moment it doesn't seem to be
> possible to call setEdge() between two parent groups (aka clusters). If
> someone has ideas how to do this please feel free to contribute.
>
> One other consideration is that this example is only an extreme case. There
> are other in-between cases that still require user intervention. Let's say
> if 90% of tasks in group1 depends on 90% of tasks in group2 and both groups
> have more than 100 tasks. This will still cause a lot of edges on the graph
> and it's even harder to reduce because the parent groups are not fully
> connected so it's inaccurate to reduce them to a single edge between the
> parents. In those cases, the user may still need to do something
> themselves. e.g. adding some DummyOperator to the DAG to cut down the
> edges. There will be some tradeoff because DummyOperator takes a short
> while to execute like you mentioned.
>
> There are lots of room for improvements, but I don't think that's a
> blocking issue for this AIP? So if you can move it to the voting stage
> that'll be fantastic.
>
>
> On Thu, Aug 20, 2020 at 4:21 PM 耀 周 <zhouyao1994@icloud.com.invalid>
> wrote:
>
> > +1
> >
> > > 2020年8月18日 23:55,Gerard Casas Saez <gcasassaez@twitter.com.INVALID>
> 写道:
> > >
> > > Is it not possible to solve this at the UI level? Aka tell dagre to
> only
> > > add 1 edge to the group instead of to all nodes in the group? No need
> to
> > do
> > > SubDag behaviour, but just reduce the edges on the graph. Should reduce
> > > load time if I understand correctly.
> > >
> > > I would strongly avoid the Dummy operator since it will introduce
> delays
> > on
> > > operator execution (as it will need to execute 1 dummy operator and
> that
> > > can be expensive imo).
> > >
> > > Overall though proposal looks good, unless anyone opposes it, I would
> > move
> > > this to vote mode :D
> > >
> > > Gerard Casas Saez
> > > Twitter | Cortex | @casassaez <http://twitter.com/casassaez>
> > >
> > >
> > > On Mon, Aug 17, 2020 at 9:56 AM Yu Qian <yuqian1990@gmail.com> wrote:
> > >
> > >> Hi, All,
> > >> Here's the updated AIP-34
> > >> <
> > >>
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-34+TaskGroup%3A+A+UI+task+grouping+concept+as+an+alternative+to+SubDagOperator
> > >>> .
> > >> The PR has been fine-tuned with better UI interactions and added
> > >> serialization of TaskGroup:
> > https://github.com/apache/airflow/pull/10153
> > >>
> > >> Here's some experiment results:
> > >> A made up dag containing 403 tasks, and 5696 edges. Grouped like this.
> > Note
> > >> there's a inside_section_2 is intentionally made to depend on all
> tasks
> > >> in inside_section_1 to generate a large number of edges. The
> > observation is
> > >> that opening the top level graph is very quick, around 270ms.
> Expanding
> > >> groups that don't have a lot of dense dependencies on other groups are
> > also
> > >> hardly noticeable. E.g expanding section_1 takes 330ms. The part that
> > takes
> > >> time is when expanding both groups inside_section_1 and
> inside_section_2
> > >> Because there are 2500 edges between these two inner groups, it took
> 63
> > >> seconds to expand both of them. Majority of the time (more than
> > 62seconds)
> > >> is actually taken by the layout() function in dagre. In other words,
> > it's
> > >> very fast to add nodes and edges, but laying them out on the graph
> takes
> > >> time. This issue is not actually a problem specific to TaskGroup.
> > Without
> > >> TaskGroup, if a DAG contains too many edges, it takes time to layout
> the
> > >> graph too.
> > >>
> > >> On the other hand, a more realistic experiment with production DAG
> > >> containing about 400 tasks and 700 edges showed that grouping tasks
> into
> > >> three levels of nested TaskGroup cut the upfront page opening time
> from
> > >> around 6s to 500ms. (Obviously the time is paid back when user
> gradually
> > >> expands all the groups one by one, but normally people don't need to
> > expand
> > >> every group every time so it's still a big saving). The experiments
> are
> > >> done on OS X Mojave, 2.2 GHz, Intel Core i7, 16GB Memory, Chrome.
> > >>
> > >> I can see a few possible improvements to TaskGroup (or how it's used)
> > that
> > >> can be done as a next-step:
> > >> 1). Like Gerard suggested, we can implement lazy-loading. Instead of
> > >> displaying the whole DAG, we can limit the Graph View to show only a
> > single
> > >> TaskGroup, omitting its edges going out to other TaskGroups. This
> > behaviour
> > >> is more like SubDagOperator where users can zoom into/out of a
> TaskGroup
> > >> and look at only tasks within that TaskGroup as if those are the only
> > tasks
> > >> on the DAG. This can be done with either background javascript calls
> or
> > by
> > >> making a new get request with filtering parameters. Obviously the
> > downside
> > >> is that it's not as explicit as showing all the dependencies on the
> > graph.
> > >> 2). Users can improve the organization of the DAG themselves to reduce
> > the
> > >> number of edges. E.g. if every task in group2 depends on every tasks
> in
> > >> group1, instead of doing group1 >> group2, they can add a
> DummyOperator
> > in
> > >> between and do this: group1 >> dummy >> group2. This cuts down the
> > number
> > >> of edges significantly and page load becomes much faster.
> > >> 3). If we really want, we can improve the >> operator of TaskGroup to
> > do 2)
> > >> automatically. If it sees that both sides of >> are TaskGroup, it can
> > >> create a DummyOperator on behalf of the user. The downside is that it
> > may
> > >> be too much magic.
> > >>
> > >> Thanks,
> > >> Qian
> > >>
> > >> def create_section():
> > >> """
> > >> Create tasks in the outer section.
> > >> """
> > >> dummies = [DummyOperator(task_id=f'task-{i + 1}') for i in range(100)]
> > >>
> > >> with TaskGroup("inside_section_1") as inside_section_1:
> > >> _ = [DummyOperator(task_id=f'task-{i + 1}',) for i in range(50)]
> > >>
> > >> with TaskGroup("inside_section_2") as inside_section_2:
> > >> _ = [DummyOperator(task_id=f'task-{i + 1}',) for i in range(50)]
> > >>
> > >> dummies[-1] >> inside_section_1
> > >> dummies[-2] >> inside_section_2
> > >> inside_section_1 >> inside_section_2
> > >>
> > >>
> > >> with DAG(dag_id="example_task_group", start_date=days_ago(2)) as dag:
> > >> start = DummyOperator(task_id="start")
> > >>
> > >> with TaskGroup("section_1") as section_1:
> > >> create_section()
> > >>
> > >> some_other_task = DummyOperator(task_id="some-other-task")
> > >>
> > >> with TaskGroup("section_2") as section_2:
> > >> create_section()
> > >>
> > >> end = DummyOperator(task_id='end')
> > >>
> > >> start >> section_1 >> some_other_task >> section_2 >> end
> > >>
> > >>
> > >> On Sat, Aug 15, 2020 at 6:56 AM Gerard Casas Saez
> > >> <gcasassaez@twitter.com.invalid> wrote:
> > >>
> > >>> Re graph times. That makes sense. Let me know what you find. We may
> be
> > >> able
> > >>> to contribute on the lazy loading part.
> > >>>
> > >>> Looking forward to see the updated AIP!
> > >>>
> > >>>
> > >>> Gerard Casas Saez
> > >>> Twitter | Cortex | @casassaez <http://twitter.com/casassaez>
> > >>>
> > >>>
> > >>> On Fri, Aug 14, 2020 at 6:14 AM Kaxil Naik <kaxilnaik@gmail.com>
> > wrote:
> > >>>
> > >>>> Permissions granted, let me know if you face any issues.
> > >>>>
> > >>>> On Fri, Aug 14, 2020 at 1:10 PM Yu Qian <yuqian1990@gmail.com>
> wrote:
> > >>>>
> > >>>>> Hi, Kaxil, my ID for cwiki.apache.org is yuqian1990. Thank you!
> > >>>>>
> > >>>>> On Fri, Aug 14, 2020 at 7:35 PM Kaxil Naik <kaxilnaik@gmail.com>
> > >>> wrote:
> > >>>>>
> > >>>>>> What's your ID i.e. if you haven't created an account yet, please
> > >>>> create
> > >>>>>> one at https://cwiki.apache.org/confluence/signup.action and send
> > >> us
> > >>>>> your
> > >>>>>> ID and we will add permissions.
> > >>>>>>
> > >>>>>> Thanks. I'll edit the AIP. May I request permission to edit it?
> > >>>>>>> My wiki user email is yuqian1990@gmail.com.
> > >>>>>>
> > >>>>>>
> > >>>>>> On Fri, Aug 14, 2020 at 9:45 AM Yu Qian <yuqian1990@gmail.com>
> > >>> wrote:
> > >>>>>>
> > >>>>>>> Re, Xinbin. Thanks. I'll edit the AIP. May I request permission
> > >> to
> > >>>> edit
> > >>>>>> it?
> > >>>>>>> My wiki user email is yuqian1990@gmail.com.
> > >>>>>>>
> > >>>>>>> Re Gerard: yes the UI loads all the nodes as json from the web
> > >>> server
> > >>>>> at
> > >>>>>>> once. However, it only adds the top level nodes and edges to the
> > >>>> graph
> > >>>>>> when
> > >>>>>>> the Graph View page is first opened. And then adds the expanded
> > >>> nodes
> > >>>>> to
> > >>>>>>> the graph as the user expands them. From what I've experienced
> > >> with
> > >>>>> DAGs
> > >>>>>>> containing around 400 tasks (not using TaskGroup or
> > >>> SubDagOperator),
> > >>>>>>> opening the whole dag in Graph View usually takes 5 seconds. Less
> > >>>> than
> > >>>>>> 60ms
> > >>>>>>> of that is taken by loading the data from webserver. The
> > >> remaining
> > >>>>> 4.9s+
> > >>>>>> is
> > >>>>>>> taken by javascript functions in dagre-d3.min.js such as
> > >>> createNodes,
> > >>>>>>> createEdgeLabels, etc and by rendering the graph. With TaskGroup
> > >>>> being
> > >>>>>> used
> > >>>>>>> to group tasks into a smaller number of top-level nodes, the
> > >> amount
> > >>>> of
> > >>>>>> data
> > >>>>>>> loaded from webserver will remain about the same compared to a
> > >> flat
> > >>>> dag
> > >>>>>> of
> > >>>>>>> the same size, but the number of nodes and edges needed to be
> > >> plot
> > >>> on
> > >>>>> the
> > >>>>>>> graph can be reduced significantly. So in theory this should
> > >> speed
> > >>> up
> > >>>>> the
> > >>>>>>> time it takes to open Graph View even without lazy-loading the
> > >> data
> > >>>>> (I'll
> > >>>>>>> experiment to find out). That said, if it comes to a point
> > >>>> lazy-loading
> > >>>>>>> helps, we can still implement it as an improvement.
> > >>>>>>>
> > >>>>>>> Re James: the Tree View looks as if all all the groups are fully
> > >>>>>> expanded.
> > >>>>>>> (because under the hood all the tasks are in a single DAG). I'm
> > >>> less
> > >>>>>>> worried about Tree View at the moment because it already has a
> > >>>>> mechanism
> > >>>>>>> for collapsing tasks by the dependency tree. That said, the Tree
> > >>> View
> > >>>>> can
> > >>>>>>> definitely be improved too with TaskGroup. (e.g. collapse tasks
> > >> in
> > >>>> the
> > >>>>>> same
> > >>>>>>> TaskGroup when Tree View is first opened).
> > >>>>>>>
> > >>>>>>> For both suggestions, implementing them don't require fundamental
> > >>>>> changes
> > >>>>>>> to the idea. I think we can have a basic working TaskGroup first,
> > >>> and
> > >>>>>> then
> > >>>>>>> improve it incrementally in several PRs as we get more feedback
> > >>> from
> > >>>>> the
> > >>>>>>> community. What do you think?
> > >>>>>>>
> > >>>>>>> Qian
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> On Wed, Aug 12, 2020 at 9:15 AM James Coder <jcoder01@gmail.com>
> > >>>>> wrote:
> > >>>>>>>
> > >>>>>>>> I agree this looks great, one question, how does the tree view
> > >>>> look?
> > >>>>>>>>
> > >>>>>>>> James Coder
> > >>>>>>>>
> > >>>>>>>>> On Aug 11, 2020, at 6:48 PM, Gerard Casas Saez <
> > >>>>>> gcasassaez@twitter.com
> > >>>>>>> .invalid>
> > >>>>>>>> wrote:
> > >>>>>>>>>
> > >>>>>>>>> First of all, this is awesome!!
> > >>>>>>>>>
> > >>>>>>>>> Secondly, checking your UI code, seems you are loading all
> > >>>>> operators
> > >>>>>> at
> > >>>>>>>>> once. Wondering if we can load them as needed (aka load
> > >>> whenever
> > >>>> we
> > >>>>>>> click
> > >>>>>>>>> the TaskGroup). Some of our DAGs are so large that take
> > >> forever
> > >>>> to
> > >>>>>> load
> > >>>>>>>> on
> > >>>>>>>>> the Graph view, so worried about this still being an issue
> > >>> here.
> > >>>> It
> > >>>>>> may
> > >>>>>>>> be
> > >>>>>>>>> easily solvable by implementing lazy loading of the graph.
> > >> Not
> > >>>> sure
> > >>>>>> how
> > >>>>>>>>> easy to implement/add to the UI extension (and dont want to
> > >>> push
> > >>>>> for
> > >>>>>>>> early
> > >>>>>>>>> optimization as its the root of all evil).
> > >>>>>>>>> Gerard Casas Saez
> > >>>>>>>>> Twitter | Cortex | @casassaez <http://twitter.com/casassaez>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>> On Tue, Aug 11, 2020 at 10:35 AM Xinbin Huang <
> > >>>>>> bin.huangxb@gmail.com>
> > >>>>>>>> wrote:
> > >>>>>>>>>>
> > >>>>>>>>>> Hi Yu,
> > >>>>>>>>>>
> > >>>>>>>>>> Thank you so much for taking on this. I was fairly
> > >> distracted
> > >>>>>>> previously
> > >>>>>>>>>> and I didn't have the time to update the proposal. In fact,
> > >>>> after
> > >>>>>>>>>> discussing with Ash, Kaxil and Daniel, the direction of this
> > >>> AIP
> > >>>>> has
> > >>>>>>>> been
> > >>>>>>>>>> changed to favor the concept of TaskGroup instead of
> > >> rewriting
> > >>>>>>>>>> SubDagOperator (though it may may sense to deprecate SubDag
> > >>> in a
> > >>>>>>> future
> > >>>>>>>>>> date.).
> > >>>>>>>>>>
> > >>>>>>>>>> Your PR is amazing and it has implemented the desire
> > >>> features. I
> > >>>>>> think
> > >>>>>>>> we
> > >>>>>>>>>> can focus on your new PR instead. Do you mind updating the
> > >> AIP
> > >>>>> based
> > >>>>>>> on
> > >>>>>>>>>> what you have done in your PR?
> > >>>>>>>>>>
> > >>>>>>>>>> Best,
> > >>>>>>>>>> Bin
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>> On Tue, Aug 11, 2020 at 7:11 AM Yu Qian <
> > >>> yuqian1990@gmail.com>
> > >>>>>>> wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>> Hi, all, I've added the basic UI changes to my proposed
> > >>>>>>> implementation
> > >>>>>>>> of
> > >>>>>>>>>>> TaskGroup as UI grouping concept:
> > >>>>>>>>>>> https://github.com/apache/airflow/pull/10153
> > >>>>>>>>>>>
> > >>>>>>>>>>> I think Chris had a pretty good specification of TaskGroup
> > >> so
> > >>>> i'm
> > >>>>>>>> quoting
> > >>>>>>>>>>> it here. The only thing I don't fully agree with is the
> > >>>>> restriction
> > >>>>>>>>>>> "... **cannot*
> > >>>>>>>>>>> have dependencies between a Task in a TaskGroup and either
> > >> a*
> > >>>>>>>>>>> *   Task in a different TaskGroup or a Task not in any
> > >>>> group*". I
> > >>>>>>> think
> > >>>>>>>>>>> this is over restrictive. Since TaskGroup is a UI concept,
> > >>>> tasks
> > >>>>>> can
> > >>>>>>>> have
> > >>>>>>>>>>> dependencies on tasks in other TaskGroup or not in any
> > >>>> TaskGroup.
> > >>>>>> In
> > >>>>>>> my
> > >>>>>>>>>> PR,
> > >>>>>>>>>>> this is allowed. The graph edges will update accordingly
> > >> when
> > >>>>>>>> TaskGroups
> > >>>>>>>>>>> are expanded/collapsed. TaskGroup is only helping to make
> > >> the
> > >>>> UI
> > >>>>>> look
> > >>>>>>>>>> less
> > >>>>>>>>>>> crowded. Under the hood, everything is still a DAG of tasks
> > >>> and
> > >>>>>> edges
> > >>>>>>>> so
> > >>>>>>>>>>> things work normally. Here's a screenshot
> > >>>>>>>>>>> <
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> >
> https://raw.githubusercontent.com/yuqian90/airflow/gif_for_demo/airflow/www/static/screen-shot-short.gif
> > >>>>>>>>>>>>
> > >>>>>>>>>>> of the UI interaction.
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>> *   - Tasks can be added to a TaskGroup   - You *can* have
> > >>>>>>> dependencies
> > >>>>>>>>>>> between Tasks in the same TaskGroup, but   *cannot* have
> > >>>>>> dependencies
> > >>>>>>>>>>> between a Task in a TaskGroup and either a   Task in a
> > >>>> different
> > >>>>>>>>>> TaskGroup
> > >>>>>>>>>>> or a Task not in any group   - You *can* have dependencies
> > >>>>> between
> > >>>>>> a
> > >>>>>>>>>>> TaskGroup and either other   TaskGroups or Tasks not in any
> > >>>> group
> > >>>>>> -
> > >>>>>>>> The
> > >>>>>>>>>>> UI will by default render a TaskGroup as a single "object",
> > >>> but
> > >>>>>>> which
> > >>>>>>>>>> you
> > >>>>>>>>>>> expand or zoom into in some way   - You'd need some way to
> > >>>>>> determine
> > >>>>>>>> what
> > >>>>>>>>>>> the "status" of a TaskGroup was   at least for UI display
> > >>>>> purposes*
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>> Regarding Jake's comment, I agree it's possible to
> > >> implement
> > >>>> the
> > >>>>>>>>>> "retrying
> > >>>>>>>>>>> tasks in a group" pattern he mentioned as an optional
> > >> feature
> > >>>> of
> > >>>>>>>>>> TaskGroup
> > >>>>>>>>>>> although that may go against having TaskGroup as a pure UI
> > >>>>> concept.
> > >>>>>>> For
> > >>>>>>>>>> the
> > >>>>>>>>>>> motivating example Jake provided, I suggest implementing
> > >> both
> > >>>>>>>>>>> SubmitLongRunningJobTask and PollJobStatusSensor in a
> > >> single
> > >>>>>>> operator.
> > >>>>>>>> It
> > >>>>>>>>>>> can do something like BaseSensorOperator.execute() does in
> > >>>>>>> "reschedule"
> > >>>>>>>>>>> mode, i.e. it first executes some code to submit the long
> > >>>> running
> > >>>>>> job
> > >>>>>>>> to
> > >>>>>>>>>>> the external service, and store the state (e.g. in XCom).
> > >>> Then
> > >>>>>>>> reschedule
> > >>>>>>>>>>> itself. Subsequent runs then pokes for the completion
> > >> state.
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>> On Thu, Aug 6, 2020 at 2:08 AM Jacob Ferriero
> > >>>>>>>>>> <jferriero@google.com.invalid
> > >>>>>>>>>>>>
> > >>>>>>>>>>> wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>>> I really like this idea of a TaskGroup container as I
> > >> think
> > >>>> this
> > >>>>>>> will
> > >>>>>>>>>> be
> > >>>>>>>>>>>> much easier to use than SubDag.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> I'd like to propose an optional behavior for special retry
> > >>>>>> mechanics
> > >>>>>>>>>> via
> > >>>>>>>>>>> a
> > >>>>>>>>>>>> TaskGroup.retry_all property.
> > >>>>>>>>>>>> This way I could use TaskGroup to replace my favorite use
> > >> of
> > >>>>>> SubDag
> > >>>>>>>> for
> > >>>>>>>>>>>> atomically retrying tasks of the pattern "act on external
> > >>>> state
> > >>>>>> then
> > >>>>>>>>>>>> reschedule poll until desired state reached".
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Motivating use case I have for a SubDag is very simple two
> > >>>> task
> > >>>>>>> group
> > >>>>>>>>>>>> [SubmitLongRunningJobTask >> PollJobStatusSensor].
> > >>>>>>>>>>>> I use SubDag is because it gives me an easy way to retry
> > >> the
> > >>>>>>>>>>> SubmitJobTask
> > >>>>>>>>>>>> if something about the PollJobSensor fails.
> > >>>>>>>>>>>> This pattern would be really nice for jobs that are
> > >> expected
> > >>>> to
> > >>>>>> run
> > >>>>>>> a
> > >>>>>>>>>>> long
> > >>>>>>>>>>>> time (because we can use sensor can use reschedule mode
> > >>>> freeing
> > >>>>> up
> > >>>>>>>>>> slots)
> > >>>>>>>>>>>> but might fail for a retryable reason.
> > >>>>>>>>>>>> However, using SubDag to meet this use case defeats the
> > >>>> purpose
> > >>>>>>>> because
> > >>>>>>>>>>>> SubDag infamously
> > >>>>>>>>>>>> <
> > >>>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> >
> https://medium.com/@team_24989/fixing-subdagoperator-deadlock-in-airflow-6c64312ebb10
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>> blocks a "controller" slot for the entire duration.
> > >>>>>>>>>>>> This may feel like a cyclic behavior but reality it is
> > >> very
> > >>>>> common
> > >>>>>>> for
> > >>>>>>>>>> a
> > >>>>>>>>>>>> single operator to submit job / wait til done.
> > >>>>>>>>>>>> We could use this case refactor many operators (e.g. BQ,
> > >>>>> Dataproc,
> > >>>>>>>>>>>> Dataflow) to be implemented as TaskGroup[SubmitTask >>
> > >>>> PollTask]
> > >>>>>>> with
> > >>>>>>>>>> an
> > >>>>>>>>>>>> optional reschedule mode if user knows that this job may
> > >>> take
> > >>>> a
> > >>>>>> long
> > >>>>>>>>>>> time.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> I'd be happy to the development work on adding this
> > >> specific
> > >>>>> retry
> > >>>>>>>>>>> behavior
> > >>>>>>>>>>>> to TaskGroup once the base concept is implemented if
> > >> others
> > >>> in
> > >>>>> the
> > >>>>>>>>>>>> community would find this a useful feature.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Cheers,
> > >>>>>>>>>>>> Jake
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> On Tue, Aug 4, 2020 at 10:07 AM Jarek Potiuk <
> > >>>>>>>> Jarek.Potiuk@polidea.com
> > >>>>>>>>>>>
> > >>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>> All for it :) . I think we are getting closer to have
> > >>> regular
> > >>>>>>>>>> planning
> > >>>>>>>>>>>> and
> > >>>>>>>>>>>>> making some structured approach to 2.0 and starting task
> > >>>> force
> > >>>>>> for
> > >>>>>>> it
> > >>>>>>>>>>>> soon,
> > >>>>>>>>>>>>> so I think this should be perfectly fine to discuss and
> > >>> even
> > >>>>>> start
> > >>>>>>>>>>>>> implementing what's beyond as soon as we make sure that
> > >> we
> > >>>> are
> > >>>>>>>>>>>> prioritizing
> > >>>>>>>>>>>>> 2.0 work.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> J,
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> On Tue, Aug 4, 2020 at 12:09 PM Yu Qian <
> > >>>> yuqian1990@gmail.com>
> > >>>>>>>>>> wrote:
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Hi Jarek,
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> I agree we should not change the behaviour of the
> > >> existing
> > >>>>>>>>>>>> SubDagOperator
> > >>>>>>>>>>>>>> till Airflow 2.1. Is it okay to continue the discussion
> > >>>> about
> > >>>>>>>>>>> TaskGroup
> > >>>>>>>>>>>>> as
> > >>>>>>>>>>>>>> a brand new concept/feature independent from the
> > >> existing
> > >>>>>>>>>>>> SubDagOperator?
> > >>>>>>>>>>>>>> In other words, shall we add TaskGroup as a UI grouping
> > >>>>> concept
> > >>>>>>>>>> like
> > >>>>>>>>>>>> Ash
> > >>>>>>>>>>>>>> suggested, and not touch SubDagOperator atl all.
> > >> Whenever
> > >>> we
> > >>>>> are
> > >>>>>>>>>>> ready
> > >>>>>>>>>>>>> with
> > >>>>>>>>>>>>>> TaskGroup, we then deprecate SubDagOperator in Airflow
> > >>> 2.1.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> I really like Ash's idea of simplifying the
> > >> SubDagOperator
> > >>>>> idea
> > >>>>>>>>>> into
> > >>>>>>>>>>> a
> > >>>>>>>>>>>>>> simple UI grouping concept. I think Xinbin's idea of
> > >>>>>> "reattaching
> > >>>>>>>>>> all
> > >>>>>>>>>>>> the
> > >>>>>>>>>>>>>> tasks to the root DAG" is the way to go. And I see James
> > >>>>> pointed
> > >>>>>>>>>> out
> > >>>>>>>>>>> we
> > >>>>>>>>>>>>>> need some helper functions to simplify dependencies
> > >>> setting
> > >>>> of
> > >>>>>>>>>>>> TaskGroup.
> > >>>>>>>>>>>>>> Xinbin put up a pretty elegant example in his PR
> > >>>>>>>>>>>>>> <https://github.com/apache/airflow/pull/9243>. I think
> > >>>> having
> > >>>>>>>>>>>> TaskGroup
> > >>>>>>>>>>>>> as
> > >>>>>>>>>>>>>> a UI concept should be a relatively small change. We can
> > >>>>>> simplify
> > >>>>>>>>>>>>> Xinbin's
> > >>>>>>>>>>>>>> PR further. So I put up this alternative proposal here:
> > >>>>>>>>>>>>>> https://github.com/apache/airflow/pull/10153
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> I have not done any UI changes due to lack of experience
> > >>>> with
> > >>>>>> web
> > >>>>>>>>>> UI.
> > >>>>>>>>>>>> If
> > >>>>>>>>>>>>>> anyone's interested, please take a look at the PR.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Qian
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> On Mon, Jun 22, 2020 at 5:15 AM Jarek Potiuk <
> > >>>>>>>>>>> Jarek.Potiuk@polidea.com
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> Similar point here to the other ideas that are popping
> > >>> up.
> > >>>>>> Maybe
> > >>>>>>>>>> we
> > >>>>>>>>>>>>>> should
> > >>>>>>>>>>>>>>> just focus on completing 2.0 and make all discussions
> > >>> about
> > >>>>>>>>>> further
> > >>>>>>>>>>>>>>> improvements to 2.1? While those are important
> > >>> discussions
> > >>>>> (and
> > >>>>>>>>>> we
> > >>>>>>>>>>>>> should
> > >>>>>>>>>>>>>>> continue them in the  near future !) I think at this
> > >>> point
> > >>>>>>>>>> focusing
> > >>>>>>>>>>>> on
> > >>>>>>>>>>>>>>> delivering 2.0 in its current shape should be our focus
> > >>>> now ?
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> J.
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> On Thu, Jun 18, 2020 at 6:35 PM Xinbin Huang <
> > >>>>>>>>>>> bin.huangxb@gmail.com>
> > >>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> Hi Daniel
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> I agree that the TaskGroup should have the same API
> > >> as a
> > >>>> DAG
> > >>>>>>>>>>> object
> > >>>>>>>>>>>>>>> related
> > >>>>>>>>>>>>>>>> to task dependencies, but it will not have anything
> > >>>> related
> > >>>>> to
> > >>>>>>>>>>>> actual
> > >>>>>>>>>>>>>>>> execution or scheduling.
> > >>>>>>>>>>>>>>>> I will update the AIP according to this over the
> > >>> weekend.
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> We could even make a “DAGTemplate” object s.t. when
> > >> you
> > >>>>>>>>>> import
> > >>>>>>>>>>>> the
> > >>>>>>>>>>>>>>> object
> > >>>>>>>>>>>>>>>> you can import it with parameters to determine the
> > >> shape
> > >>>> of
> > >>>>>> the
> > >>>>>>>>>>>> DAG.
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> Can you elaborate a bit more on this? Does it serve a
> > >>>>> similar
> > >>>>>>>>>>>> purpose
> > >>>>>>>>>>>>>> as
> > >>>>>>>>>>>>>>> a
> > >>>>>>>>>>>>>>>> DAG factory function?
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> On Thu, Jun 18, 2020 at 9:13 AM Daniel Imberman <
> > >>>>>>>>>>>>>>> daniel.imberman@gmail.com
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> Hi Bin,
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> Why not give the TaskGroup the same API as a DAG
> > >> object
> > >>>>> (e.g.
> > >>>>>>>>>>> the
> > >>>>>>>>>>>>>>> bitwise
> > >>>>>>>>>>>>>>>>> operator fro task dependencies). We could even make a
> > >>>>>>>>>>>> “DAGTemplate”
> > >>>>>>>>>>>>>>>> object
> > >>>>>>>>>>>>>>>>> s.t. when you import the object you can import it
> > >> with
> > >>>>>>>>>>> parameters
> > >>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>>> determine the shape of the DAG.
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> On Wed, Jun 17, 2020 at 8:54 PM, Xinbin Huang <
> > >>>>>>>>>>>>> bin.huangxb@gmail.com
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>> The TaskGroup will not take schedule interval as a
> > >>>>> parameter
> > >>>>>>>>>>>>> itself,
> > >>>>>>>>>>>>>>> and
> > >>>>>>>>>>>>>>>> it
> > >>>>>>>>>>>>>>>>> depends on the DAG where it attaches to. In my
> > >> opinion,
> > >>>> the
> > >>>>>>>>>>>>> TaskGroup
> > >>>>>>>>>>>>>>>> will
> > >>>>>>>>>>>>>>>>> only contain a group of tasks with interdependencies,
> > >>> and
> > >>>>> the
> > >>>>>>>>>>>>>> TaskGroup
> > >>>>>>>>>>>>>>>>> behaves like a task. It doesn't contain any
> > >>>>>>>>>>> execution/scheduling
> > >>>>>>>>>>>>>> logic
> > >>>>>>>>>>>>>>>>> (i.e. schedule_interval, concurrency, max_active_runs
> > >>>> etc.)
> > >>>>>>>>>>> like
> > >>>>>>>>>>>> a
> > >>>>>>>>>>>>>> DAG
> > >>>>>>>>>>>>>>>>> does.
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> For example, there is the scenario that the schedule
> > >>>>>>>>>> interval
> > >>>>>>>>>>>> of
> > >>>>>>>>>>>>>> DAG
> > >>>>>>>>>>>>>>> is
> > >>>>>>>>>>>>>>>>> 1 hour and the schedule interval of TaskGroup is 20
> > >>> min.
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> I am curious why you ask this. Is this a use case
> > >> that
> > >>>> you
> > >>>>>>>>>> want
> > >>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>> achieve?
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> Bin
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> On Wed, Jun 17, 2020 at 7:59 PM 蒋晓峰 <
> > >>>>>>>>>> thanosxnicholas@gmail.com
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> Hi Bin,
> > >>>>>>>>>>>>>>>>>> Using TaskGroup, Is the schedule interval of
> > >> TaskGroup
> > >>>> the
> > >>>>>>>>>>> same
> > >>>>>>>>>>>>> as
> > >>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>> parent DAG? My main concern is whether the schedule
> > >>>>>>>>>> interval
> > >>>>>>>>>>> of
> > >>>>>>>>>>>>>>>> TaskGroup
> > >>>>>>>>>>>>>>>>>> could be different with that of the DAG? For
> > >> example,
> > >>>>> there
> > >>>>>>>>>>> is
> > >>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>> scenario
> > >>>>>>>>>>>>>>>>>> that the schedule interval of DAG is 1 hour and the
> > >>>>>>>>>> schedule
> > >>>>>>>>>>>>>> interval
> > >>>>>>>>>>>>>>>> of
> > >>>>>>>>>>>>>>>>>> TaskGroup is 20 min.
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> Cheers,
> > >>>>>>>>>>>>>>>>>> Nicholas
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> On Thu, Jun 18, 2020 at 10:30 AM Xinbin Huang <
> > >>>>>>>>>>>>>> bin.huangxb@gmail.com
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> Hi Nicholas,
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> I am not sure about the old behavior of
> > >>> SubDagOperator,
> > >>>>>>>>>>> maybe
> > >>>>>>>>>>>>> it
> > >>>>>>>>>>>>>>> will
> > >>>>>>>>>>>>>>>>>> throw
> > >>>>>>>>>>>>>>>>>>> an error? But in the original proposal, the
> > >> subdag's
> > >>>>>>>>>>>>>>>> schedule_interval
> > >>>>>>>>>>>>>>>>>> will
> > >>>>>>>>>>>>>>>>>>> be ignored. Or if we decide to use TaskGroup to
> > >>> replace
> > >>>>>>>>>>>> SubDag,
> > >>>>>>>>>>>>>>> there
> > >>>>>>>>>>>>>>>>>> will
> > >>>>>>>>>>>>>>>>>>> be no subdag schedule_interval.
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> Bin
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> On Wed, Jun 17, 2020 at 6:21 PM 蒋晓峰 <
> > >>>>>>>>>>>> thanosxnicholas@gmail.com
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> Hi Bin,
> > >>>>>>>>>>>>>>>>>>>> Thanks for your good proposal. I was confused
> > >>> whether
> > >>>>>>>>>> the
> > >>>>>>>>>>>>>>> schedule
> > >>>>>>>>>>>>>>>>>>>> interval of SubDAG is different from that of the
> > >>>> parent
> > >>>>>>>>>>>> DAG?
> > >>>>>>>>>>>>> I
> > >>>>>>>>>>>>>>> have
> > >>>>>>>>>>>>>>>>>>>> discussed with Jiajie Zhong about the schedule
> > >>>> interval
> > >>>>>>>>>>> of
> > >>>>>>>>>>>>>>> SubDAG.
> > >>>>>>>>>>>>>>>> If
> > >>>>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>> SubDagOperator has a different schedule interval,
> > >>> what
> > >>>>>>>>>>> will
> > >>>>>>>>>>>>>>> happen
> > >>>>>>>>>>>>>>>>> for
> > >>>>>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>> scheduler to schedule the parent DAG?
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> Regards,
> > >>>>>>>>>>>>>>>>>>>> Nicholas Jiang
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> On Thu, Jun 18, 2020 at 8:04 AM Xinbin Huang <
> > >>>>>>>>>>>>>>>> bin.huangxb@gmail.com>
> > >>>>>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> Thank you, Max, Kaxil, and everyone's feedback!
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> I have rethought about the concept of subdag and
> > >>> task
> > >>>>>>>>>>>>>> groups. I
> > >>>>>>>>>>>>>>>>> think
> > >>>>>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>>> better way to approach this is to entirely remove
> > >>>>>>>>>>> subdag
> > >>>>>>>>>>>>> and
> > >>>>>>>>>>>>>>>>>> introduce
> > >>>>>>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>>> concept of TaskGroup, which is a container of
> > >> tasks
> > >>>>>>>>>>> along
> > >>>>>>>>>>>>>> with
> > >>>>>>>>>>>>>>>>> their
> > >>>>>>>>>>>>>>>>>>>>> dependencies *without execution/scheduling logic
> > >>> as a
> > >>>>>>>>>>>> DAG*.
> > >>>>>>>>>>>>>> The
> > >>>>>>>>>>>>>>>>> only
> > >>>>>>>>>>>>>>>>>>>>> purpose of it is to group a list of tasks, but
> > >> you
> > >>>>>>>>>>> still
> > >>>>>>>>>>>>> need
> > >>>>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>>> add
> > >>>>>>>>>>>>>>>>>> it
> > >>>>>>>>>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>>>>>>> a DAG for execution.
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> Here is a small code snippet.
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> ```
> > >>>>>>>>>>>>>>>>>>>>> class TaskGroup:
> > >>>>>>>>>>>>>>>>>>>>> """
> > >>>>>>>>>>>>>>>>>>>>> A TaskGroup contains a group of tasks.
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> If default_args is missing, it will take default
> > >>> args
> > >>>>>>>>>>>> from
> > >>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>> DAG.
> > >>>>>>>>>>>>>>>>>>>>> """
> > >>>>>>>>>>>>>>>>>>>>> def __init__(self, group_id, default_args):
> > >>>>>>>>>>>>>>>>>>>>> pass
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> """
> > >>>>>>>>>>>>>>>>>>>>> You can add tasks to a task group similar to
> > >> adding
> > >>>>>>>>>>> tasks
> > >>>>>>>>>>>>> to
> > >>>>>>>>>>>>>> a
> > >>>>>>>>>>>>>>>> DAG
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> This can be declared in a separate file from the
> > >>> dag
> > >>>>>>>>>>> file
> > >>>>>>>>>>>>>>>>>>>>> """
> > >>>>>>>>>>>>>>>>>>>>> download_group = TaskGroup(group_id='download',
> > >>>>>>>>>>>>>>>>>>>> default_args=default_args)
> > >>>>>>>>>>>>>>>>>>>>> download_group.add_task(task1)
> > >>>>>>>>>>>>>>>>>>>>> task2.dag = download_group
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> with download_group:
> > >>>>>>>>>>>>>>>>>>>>> task3 = DummyOperator(task_id='task3')
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> [task, task2] >> task3
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> """Add it to a DAG for execution"""
> > >>>>>>>>>>>>>>>>>>>>> with DAG(dag_id='start_download_dag',
> > >>>>>>>>>>>>>>> default_args=default_args,
> > >>>>>>>>>>>>>>>>>>>>> schedule_interval='@daily', ...) as dag:
> > >>>>>>>>>>>>>>>>>>>>> start = DummyOperator(task_id='start')
> > >>>>>>>>>>>>>>>>>>>>> start >> download_group
> > >>>>>>>>>>>>>>>>>>>>> # this is equivalent to
> > >>>>>>>>>>>>>>>>>>>>> # start >> [task, task2] >> task3
> > >>>>>>>>>>>>>>>>>>>>> ```
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> With this, we can still reuse a group of tasks
> > >> and
> > >>>>>>>>>> set
> > >>>>>>>>>>>>>>>> dependencies
> > >>>>>>>>>>>>>>>>>>>> between
> > >>>>>>>>>>>>>>>>>>>>> them; it avoids the boilerplate code from using
> > >>>>>>>>>>>>>> SubDagOperator,
> > >>>>>>>>>>>>>>>> and
> > >>>>>>>>>>>>>>>>>> we
> > >>>>>>>>>>>>>>>>>>>> can
> > >>>>>>>>>>>>>>>>>>>>> declare dependencies as `task >> task_group >>
> > >>> task`.
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> User migration wise, we can introduce it before
> > >>>>>>>>>> Airflow
> > >>>>>>>>>>>> 2.0
> > >>>>>>>>>>>>>> and
> > >>>>>>>>>>>>>>>>> allow
> > >>>>>>>>>>>>>>>>>>>>> gradual transition. Then we can decide if we
> > >> still
> > >>>>>>>>>> want
> > >>>>>>>>>>>> to
> > >>>>>>>>>>>>>> keep
> > >>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>>> SubDagOperator or simply remove it.
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> Any thoughts?
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> Cheers,
> > >>>>>>>>>>>>>>>>>>>>> Bin
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> On Wed, Jun 17, 2020 at 7:37 AM Maxime
> > >> Beauchemin <
> > >>>>>>>>>>>>>>>>>>>>> maximebeauchemin@gmail.com> wrote:
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>> +1, proposal looks good.
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>> The original intention was really to have tasks
> > >>>>>>>>>>> groups
> > >>>>>>>>>>>>> and
> > >>>>>>>>>>>>>> a
> > >>>>>>>>>>>>>>>>>>>> zoom-in/out
> > >>>>>>>>>>>>>>>>>>>>> in
> > >>>>>>>>>>>>>>>>>>>>>> the UI. The original reasoning was to reuse the
> > >>> DAG
> > >>>>>>>>>>>>> object
> > >>>>>>>>>>>>>>>> since
> > >>>>>>>>>>>>>>>>> it
> > >>>>>>>>>>>>>>>>>>> is
> > >>>>>>>>>>>>>>>>>>>> a
> > >>>>>>>>>>>>>>>>>>>>>> group of tasks, but as highlighted here it does
> > >>>>>>>>>>> create
> > >>>>>>>>>>>>>>>> underlying
> > >>>>>>>>>>>>>>>>>>>>>> confusions since a DAG is much more than just a
> > >>>>>>>>>> group
> > >>>>>>>>>>>> of
> > >>>>>>>>>>>>>>> tasks.
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>> Max
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>> On Mon, Jun 15, 2020 at 2:43 AM Poornima Joshi <
> > >>>>>>>>>>>>>>>>>>>>> joshipoornima06@gmail.com>
> > >>>>>>>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>> Thank you for your email.
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>> On Sat, Jun 13, 2020 at 12:18 AM Xinbin Huang <
> > >>>>>>>>>>>>>>>>>>> bin.huangxb@gmail.com
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>> - *Unpack SubDags during dag parsing*: This
> > >>>>>>>>>>>>>> rewrites
> > >>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>>>>>> *DagBag.bag_dag*
> > >>>>>>>>>>>>>>>>>>>>>>>>>> method to unpack subdag while parsing, and
> > >>>>>>>>>> it
> > >>>>>>>>>>>>> will
> > >>>>>>>>>>>>>>>> give a
> > >>>>>>>>>>>>>>>>>>>> flat
> > >>>>>>>>>>>>>>>>>>>>>>>>>> structure at
> > >>>>>>>>>>>>>>>>>>>>>>>>>> the task level
> > >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>> The serialized_dag representation already
> > >>>>>>>>>> does
> > >>>>>>>>>>>>> this I
> > >>>>>>>>>>>>>>>>> think.
> > >>>>>>>>>>>>>>>>>> At
> > >>>>>>>>>>>>>>>>>>>>> least
> > >>>>>>>>>>>>>>>>>>>>>>> if
> > >>>>>>>>>>>>>>>>>>>>>>>>> I've understood your idea here correctly.
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>> I am not sure about serialized_dag
> > >>>>>>>>>>> representation,
> > >>>>>>>>>>>>> but
> > >>>>>>>>>>>>>> at
> > >>>>>>>>>>>>>>>>> least
> > >>>>>>>>>>>>>>>>>>> it
> > >>>>>>>>>>>>>>>>>>>>> will
> > >>>>>>>>>>>>>>>>>>>>>>>> still keep the subdag entry in the DAG table?
> > >>>>>>>>>> In
> > >>>>>>>>>>> my
> > >>>>>>>>>>>>>>>> proposal
> > >>>>>>>>>>>>>>>>> as
> > >>>>>>>>>>>>>>>>>>>> also
> > >>>>>>>>>>>>>>>>>>>>> in
> > >>>>>>>>>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>>>>>> draft PR, the idea is to *extract the tasks
> > >>>>>>>>>> from
> > >>>>>>>>>>>> the
> > >>>>>>>>>>>>>>> subdag
> > >>>>>>>>>>>>>>>>> and
> > >>>>>>>>>>>>>>>>>>> add
> > >>>>>>>>>>>>>>>>>>>>>> them
> > >>>>>>>>>>>>>>>>>>>>>>>> back to the root_dag. *So the runtime DAG
> > >> graph
> > >>>>>>>>>>>> will
> > >>>>>>>>>>>>>> look
> > >>>>>>>>>>>>>>>>>> exactly
> > >>>>>>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>>>>>> same as without subdag but with metadata
> > >>>>>>>>>> attached
> > >>>>>>>>>>>> to
> > >>>>>>>>>>>>>>> those
> > >>>>>>>>>>>>>>>>>>>> sections.
> > >>>>>>>>>>>>>>>>>>>>>>> These
> > >>>>>>>>>>>>>>>>>>>>>>>> metadata will be later on used to render in
> > >> the
> > >>>>>>>>>>> UI.
> > >>>>>>>>>>>>> So
> > >>>>>>>>>>>>>>>> after
> > >>>>>>>>>>>>>>>>>>>> parsing
> > >>>>>>>>>>>>>>>>>>>>> (
> > >>>>>>>>>>>>>>>>>>>>>>>> *DagBag.process_file()*), it will just output
> > >>>>>>>>>> the
> > >>>>>>>>>>>>>>> *root_dag
> > >>>>>>>>>>>>>>>>>>>> *instead
> > >>>>>>>>>>>>>>>>>>>>> of
> > >>>>>>>>>>>>>>>>>>>>>>> *root_dag +
> > >>>>>>>>>>>>>>>>>>>>>>>> subdag + subdag + nested subdag* etc.
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>> - e.g. section-1-* will have metadata
> > >>>>>>>>>>>>>>>>>> current_group=section-1,
> > >>>>>>>>>>>>>>>>>>>>>>>> parent_group=<the-root-dag-id> (welcome for
> > >>>>>>>>>>> naming
> > >>>>>>>>>>>>>>>>>>> suggestions),
> > >>>>>>>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>>>>>> reason for parent_group is that we can have
> > >>>>>>>>>>> nested
> > >>>>>>>>>>>>>> group
> > >>>>>>>>>>>>>>>> and
> > >>>>>>>>>>>>>>>>>>>> still
> > >>>>>>>>>>>>>>>>>>>>>> be
> > >>>>>>>>>>>>>>>>>>>>>>>> able to capture the dependency.
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>> Runtime DAG:
> > >>>>>>>>>>>>>>>>>>>>>>>> [image: image.png]
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>> While at the UI, what we see would be
> > >> something
> > >>>>>>>>>>>> like
> > >>>>>>>>>>>>>> this
> > >>>>>>>>>>>>>>>> by
> > >>>>>>>>>>>>>>>>>>>>> utilizing
> > >>>>>>>>>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>>>>>> metadata, and then we can expand or zoom into
> > >>>>>>>>>> in
> > >>>>>>>>>>>> some
> > >>>>>>>>>>>>>>> way.
> > >>>>>>>>>>>>>>>>>>>>>>>> [image: image.png]
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>> The benefits I can see is that:
> > >>>>>>>>>>>>>>>>>>>>>>>> 1. We don't need to deal with the extra
> > >>>>>>>>>>> complexity
> > >>>>>>>>>>>> of
> > >>>>>>>>>>>>>>>> SubDag
> > >>>>>>>>>>>>>>>>>> for
> > >>>>>>>>>>>>>>>>>>>>>>> execution
> > >>>>>>>>>>>>>>>>>>>>>>>> and scheduling. It will be the same as not
> > >>>>>>>>>> using
> > >>>>>>>>>>>>>> SubDag.
> > >>>>>>>>>>>>>>>>>>>>>>>> 2. Still have the benefits of modularized and
> > >>>>>>>>>>>>> reusable
> > >>>>>>>>>>>>>>> dag
> > >>>>>>>>>>>>>>>>> code
> > >>>>>>>>>>>>>>>>>>> and
> > >>>>>>>>>>>>>>>>>>>>>>>> declare dependencies between them. And with
> > >> the
> > >>>>>>>>>>> new
> > >>>>>>>>>>>>>>>>>>> SubDagOperator
> > >>>>>>>>>>>>>>>>>>>>> (see
> > >>>>>>>>>>>>>>>>>>>>>>> AIP
> > >>>>>>>>>>>>>>>>>>>>>>>> or draft PR), we can use the same dag_factory
> > >>>>>>>>>>>>> function
> > >>>>>>>>>>>>>>> for
> > >>>>>>>>>>>>>>>>>>>>> generating 1
> > >>>>>>>>>>>>>>>>>>>>>>>> dag, a lot of dynamic dags, or used for SubDag
> > >>>>>>>>>>> (in
> > >>>>>>>>>>>>> this
> > >>>>>>>>>>>>>>>> case,
> > >>>>>>>>>>>>>>>>>> it
> > >>>>>>>>>>>>>>>>>>>> will
> > >>>>>>>>>>>>>>>>>>>>>>> just
> > >>>>>>>>>>>>>>>>>>>>>>>> extract all underlying tasks and append to the
> > >>>>>>>>>>> root
> > >>>>>>>>>>>>>> dag).
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>> - Then it gets to the idea of replacing subdag
> > >>>>>>>>>>>> with a
> > >>>>>>>>>>>>>>>>>> simpler
> > >>>>>>>>>>>>>>>>>>>>>> concept
> > >>>>>>>>>>>>>>>>>>>>>>>> by Ash: the proposed change basically drains
> > >>>>>>>>>> out
> > >>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>> contents
> > >>>>>>>>>>>>>>>>>>>> of
> > >>>>>>>>>>>>>>>>>>>>> a
> > >>>>>>>>>>>>>>>>>>>>>>> SubDag
> > >>>>>>>>>>>>>>>>>>>>>>>> and becomes more like
> > >>>>>>>>>>>>>>>>>>>> ExtractSubdagTasksAndAppendToRootdagOperator
> > >>>>>>>>>>>>>>>>>>>>>>> (forgive
> > >>>>>>>>>>>>>>>>>>>>>>>> me about the crazy name..). In this case, it
> > >> is
> > >>>>>>>>>>>> still
> > >>>>>>>>>>>>>>>>>>> necessary
> > >>>>>>>>>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>>>>>>>>> keep the
> > >>>>>>>>>>>>>>>>>>>>>>>> concept of subdag as it is nothing more than a
> > >>>>>>>>>>>> name?
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>> That's why the TaskGroup idea comes up. Thanks
> > >>>>>>>>>>>> Chris
> > >>>>>>>>>>>>>>> Palmer
> > >>>>>>>>>>>>>>>>> for
> > >>>>>>>>>>>>>>>>>>>>> helping
> > >>>>>>>>>>>>>>>>>>>>>>>> conceptualize the functionality of TaskGroup,
> > >> I
> > >>>>>>>>>>>> will
> > >>>>>>>>>>>>>> just
> > >>>>>>>>>>>>>>>>> paste
> > >>>>>>>>>>>>>>>>>>> it
> > >>>>>>>>>>>>>>>>>>>>>> here.
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>> - Tasks can be added to a TaskGroup
> > >>>>>>>>>>>>>>>>>>>>>>>>> - You *can* have dependencies between Tasks
> > >>>>>>>>>> in
> > >>>>>>>>>>>> the
> > >>>>>>>>>>>>>> same
> > >>>>>>>>>>>>>>>>>>>> TaskGroup,
> > >>>>>>>>>>>>>>>>>>>>>> but
> > >>>>>>>>>>>>>>>>>>>>>>>>> *cannot* have dependencies between a Task in
> > >>>>>>>>>> a
> > >>>>>>>>>>>>>>> TaskGroup
> > >>>>>>>>>>>>>>>>>> and
> > >>>>>>>>>>>>>>>>>>>>>> either a
> > >>>>>>>>>>>>>>>>>>>>>>>>> Task in a different TaskGroup or a Task not
> > >>>>>>>>>> in
> > >>>>>>>>>>>> any
> > >>>>>>>>>>>>>>> group
> > >>>>>>>>>>>>>>>>>>>>>>>>> - You *can* have dependencies between a
> > >>>>>>>>>>> TaskGroup
> > >>>>>>>>>>>>> and
> > >>>>>>>>>>>>>>>>>> either
> > >>>>>>>>>>>>>>>>>>>>> other
> > >>>>>>>>>>>>>>>>>>>>>>>>> TaskGroups or Tasks not in any group
> > >>>>>>>>>>>>>>>>>>>>>>>>> - The UI will by default render a TaskGroup
> > >>>>>>>>>> as
> > >>>>>>>>>>> a
> > >>>>>>>>>>>>>> single
> > >>>>>>>>>>>>>>>>>>>> "object",
> > >>>>>>>>>>>>>>>>>>>>>> but
> > >>>>>>>>>>>>>>>>>>>>>>>>> which you expand or zoom into in some way
> > >>>>>>>>>>>>>>>>>>>>>>>>> - You'd need some way to determine what the
> > >>>>>>>>>>>>> "status"
> > >>>>>>>>>>>>>>> of a
> > >>>>>>>>>>>>>>>>>>>>> TaskGroup
> > >>>>>>>>>>>>>>>>>>>>>>> was
> > >>>>>>>>>>>>>>>>>>>>>>>>> at least for UI display purposes
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>> I agree with Chris:
> > >>>>>>>>>>>>>>>>>>>>>>>> - From the backend's view (scheduler &
> > >>>>>>>>>>> executor), I
> > >>>>>>>>>>>>>> think
> > >>>>>>>>>>>>>>>>>>> TaskGroup
> > >>>>>>>>>>>>>>>>>>>>>>> should
> > >>>>>>>>>>>>>>>>>>>>>>>> be ignored during execution. (unless we decide
> > >>>>>>>>>> to
> > >>>>>>>>>>>>>>> implement
> > >>>>>>>>>>>>>>>>>> some
> > >>>>>>>>>>>>>>>>>>>>>> metadata
> > >>>>>>>>>>>>>>>>>>>>>>>> operations that allows start/stop a group of
> > >>>>>>>>>>> tasks
> > >>>>>>>>>>>>>> etc.)
> > >>>>>>>>>>>>>>>>>>>>>>>> - From the UI's View, it should be able to
> > >> pick
> > >>>>>>>>>>> up
> > >>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>> individual
> > >>>>>>>>>>>>>>>>>>>>>> tasks'
> > >>>>>>>>>>>>>>>>>>>>>>>> status and then determine the TaskGroup's
> > >>>>>>>>>> status
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>> Bin
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 10:28 AM Daniel
> > >>>>>>>>>> Imberman
> > >>>>>>>>>>> <
> > >>>>>>>>>>>>>>>>>>>>>>>> daniel.imberman@gmail.com> wrote:
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>> I hadn’t thought about using the `>>`
> > >> operator
> > >>>>>>>>>>> to
> > >>>>>>>>>>>>> tie
> > >>>>>>>>>>>>>>> dags
> > >>>>>>>>>>>>>>>>>>>> together
> > >>>>>>>>>>>>>>>>>>>>>> but
> > >>>>>>>>>>>>>>>>>>>>>>> I
> > >>>>>>>>>>>>>>>>>>>>>>>>> think that sounds pretty great! I wonder if
> > >> we
> > >>>>>>>>>>>> could
> > >>>>>>>>>>>>>>>>>> essentially
> > >>>>>>>>>>>>>>>>>>>>> write
> > >>>>>>>>>>>>>>>>>>>>>>> in
> > >>>>>>>>>>>>>>>>>>>>>>>>> the ability to set dependencies to all
> > >>>>>>>>>>>> starter-tasks
> > >>>>>>>>>>>>>> for
> > >>>>>>>>>>>>>>>>> that
> > >>>>>>>>>>>>>>>>>>> DAG.
> > >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>> I’m personally ok with SubDag being a mostly
> > >>>>>>>>>> UI
> > >>>>>>>>>>>>>> concept.
> > >>>>>>>>>>>>>>>> It
> > >>>>>>>>>>>>>>>>>>>> doesn’t
> > >>>>>>>>>>>>>>>>>>>>>> need
> > >>>>>>>>>>>>>>>>>>>>>>>>> to execute separately, you’re just adding
> > >> more
> > >>>>>>>>>>>> tasks
> > >>>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>> queue
> > >>>>>>>>>>>>>>>>>>>>> that
> > >>>>>>>>>>>>>>>>>>>>>>> will
> > >>>>>>>>>>>>>>>>>>>>>>>>> be executed when there are resources
> > >>>>>>>>>> available.
> > >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>> via Newton Mail [
> > >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> >
> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.14.6&source=email_footer_2
> > >>>>>>>>>>>>>>>>>>>>>>>>> ]
> > >>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 9:45 AM, Chris Palmer
> > >>>>>>>>>> <
> > >>>>>>>>>>>>>>>>>>> chris@crpalmer.com
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>>>>>> I agree that SubDAGs are an overly complex
> > >>>>>>>>>>>>>> abstraction.
> > >>>>>>>>>>>>>>> I
> > >>>>>>>>>>>>>>>>>> think
> > >>>>>>>>>>>>>>>>>>>> what
> > >>>>>>>>>>>>>>>>>>>>>> is
> > >>>>>>>>>>>>>>>>>>>>>>>>> needed/useful is a TaskGroup concept. On a
> > >>>>>>>>>> high
> > >>>>>>>>>>>>> level
> > >>>>>>>>>>>>>> I
> > >>>>>>>>>>>>>>>>> think
> > >>>>>>>>>>>>>>>>>>> you
> > >>>>>>>>>>>>>>>>>>>>> want
> > >>>>>>>>>>>>>>>>>>>>>>>>> this
> > >>>>>>>>>>>>>>>>>>>>>>>>> functionality:
> > >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>> - Tasks can be added to a TaskGroup
> > >>>>>>>>>>>>>>>>>>>>>>>>> - You *can* have dependencies between Tasks
> > >> in
> > >>>>>>>>>>> the
> > >>>>>>>>>>>>>> same
> > >>>>>>>>>>>>>>>>>>> TaskGroup,
> > >>>>>>>>>>>>>>>>>>>>> but
> > >>>>>>>>>>>>>>>>>>>>>>>>> *cannot* have dependencies between a Task in
> > >> a
> > >>>>>>>>>>>>>> TaskGroup
> > >>>>>>>>>>>>>>>> and
> > >>>>>>>>>>>>>>>>>>>> either
> > >>>>>>>>>>>>>>>>>>>>> a
> > >>>>>>>>>>>>>>>>>>>>>>>>> Task in a different TaskGroup or a Task not
> > >> in
> > >>>>>>>>>>> any
> > >>>>>>>>>>>>>> group
> > >>>>>>>>>>>>>>>>>>>>>>>>> - You *can* have dependencies between a
> > >>>>>>>>>>> TaskGroup
> > >>>>>>>>>>>>> and
> > >>>>>>>>>>>>>>>> either
> > >>>>>>>>>>>>>>>>>>> other
> > >>>>>>>>>>>>>>>>>>>>>>>>> TaskGroups or Tasks not in any group
> > >>>>>>>>>>>>>>>>>>>>>>>>> - The UI will by default render a TaskGroup
> > >>>>>>>>>> as a
> > >>>>>>>>>>>>>> single
> > >>>>>>>>>>>>>>>>>>> "object",
> > >>>>>>>>>>>>>>>>>>>>> but
> > >>>>>>>>>>>>>>>>>>>>>>>>> which you expand or zoom into in some way
> > >>>>>>>>>>>>>>>>>>>>>>>>> - You'd need some way to determine what the
> > >>>>>>>>>>>> "status"
> > >>>>>>>>>>>>>> of
> > >>>>>>>>>>>>>>> a
> > >>>>>>>>>>>>>>>>>>>> TaskGroup
> > >>>>>>>>>>>>>>>>>>>>>> was
> > >>>>>>>>>>>>>>>>>>>>>>>>> at least for UI display purposes
> > >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>> Not sure if it would need to be a top level
> > >>>>>>>>>>> object
> > >>>>>>>>>>>>>> with
> > >>>>>>>>>>>>>>>> its
> > >>>>>>>>>>>>>>>>>> own
> > >>>>>>>>>>>>>>>>>>>>>> database
> > >>>>>>>>>>>>>>>>>>>>>>>>> table and model or just another attribute on
> > >>>>>>>>>>>> tasks.
> > >>>>>>>>>>>>> I
> > >>>>>>>>>>>>>>>> think
> > >>>>>>>>>>>>>>>>>> you
> > >>>>>>>>>>>>>>>>>>>>> could
> > >>>>>>>>>>>>>>>>>>>>>>>>> build
> > >>>>>>>>>>>>>>>>>>>>>>>>> it in a way such that from the schedulers
> > >>>>>>>>>> point
> > >>>>>>>>>>> of
> > >>>>>>>>>>>>>> view
> > >>>>>>>>>>>>>>> a
> > >>>>>>>>>>>>>>>>> DAG
> > >>>>>>>>>>>>>>>>>>> with
> > >>>>>>>>>>>>>>>>>>>>>>>>> TaskGroups doesn't get treated any
> > >>>>>>>>>> differently.
> > >>>>>>>>>>> So
> > >>>>>>>>>>>>> it
> > >>>>>>>>>>>>>>>> really
> > >>>>>>>>>>>>>>>>>>> just
> > >>>>>>>>>>>>>>>>>>>>>>> becomes
> > >>>>>>>>>>>>>>>>>>>>>>>>> a
> > >>>>>>>>>>>>>>>>>>>>>>>>> shortcut for setting dependencies between
> > >> sets
> > >>>>>>>>>>> of
> > >>>>>>>>>>>>>> Tasks,
> > >>>>>>>>>>>>>>>> and
> > >>>>>>>>>>>>>>>>>>>> allows
> > >>>>>>>>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>>>>> UI
> > >>>>>>>>>>>>>>>>>>>>>>>>> to simplify the render of the DAG structure.
> > >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>> Chris
> > >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 12:12 PM Dan Davydov
> > >>>>>>>>>>>>>>>>>>>>>>> <ddavydov@twitter.com.invalid
> > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>> Agree with James (and think it's actually
> > >>>>>>>>>> the
> > >>>>>>>>>>>> more
> > >>>>>>>>>>>>>>>>> important
> > >>>>>>>>>>>>>>>>>>>> issue
> > >>>>>>>>>>>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>>>>>>>>>>> fix),
> > >>>>>>>>>>>>>>>>>>>>>>>>>> but I am still convinced Ash' idea is the
> > >>>>>>>>>>> right
> > >>>>>>>>>>>>> way
> > >>>>>>>>>>>>>>>>> forward
> > >>>>>>>>>>>>>>>>>>>> (just
> > >>>>>>>>>>>>>>>>>>>>> it
> > >>>>>>>>>>>>>>>>>>>>>>>>> might
> > >>>>>>>>>>>>>>>>>>>>>>>>>> require a bit more work to deprecate than
> > >>>>>>>>>>> adding
> > >>>>>>>>>>>>>>> visual
> > >>>>>>>>>>>>>>>>>>> grouping
> > >>>>>>>>>>>>>>>>>>>>> in
> > >>>>>>>>>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>>>>>>>> UI).
> > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>> There was a previous thread about this FYI
> > >>>>>>>>>>> with
> > >>>>>>>>>>>>> more
> > >>>>>>>>>>>>>>>>> context
> > >>>>>>>>>>>>>>>>>>> on
> > >>>>>>>>>>>>>>>>>>>>> why
> > >>>>>>>>>>>>>>>>>>>>>>>>> subdags
> > >>>>>>>>>>>>>>>>>>>>>>>>>> are bad and potential solutions:
> > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>
> > >>>> https://www.mail-archive.com/dev@airflow.apache.org/msg01202.html
> > >>>>>>>>>>>>>>>>>>>>>> . A
> > >>>>>>>>>>>>>>>>>>>>>>>>>> solution I outline there to Jame's problem
> > >>>>>>>>>> is
> > >>>>>>>>>>>> e.g.
> > >>>>>>>>>>>>>>>>> enabling
> > >>>>>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>> operator
> > >>>>>>>>>>>>>>>>>>>>>>>>>> for Airflow operators to work with DAGs as
> > >>>>>>>>>>>> well. I
> > >>>>>>>>>>>>>> see
> > >>>>>>>>>>>>>>>>> this
> > >>>>>>>>>>>>>>>>>>>> being
> > >>>>>>>>>>>>>>>>>>>>>>>>> separate
> > >>>>>>>>>>>>>>>>>>>>>>>>>> from Ash' solution for DAG grouping in the
> > >>>>>>>>>> UI
> > >>>>>>>>>>>> but
> > >>>>>>>>>>>>>> one
> > >>>>>>>>>>>>>>> of
> > >>>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>> two
> > >>>>>>>>>>>>>>>>>>>>>> items
> > >>>>>>>>>>>>>>>>>>>>>>>>>> required to replace all existing subdag
> > >>>>>>>>>>>>>> functionality.
> > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>> I've been working with subdags for 3 years
> > >>>>>>>>>> and
> > >>>>>>>>>>>>> they
> > >>>>>>>>>>>>>>> are
> > >>>>>>>>>>>>>>>>>>> always a
> > >>>>>>>>>>>>>>>>>>>>>> giant
> > >>>>>>>>>>>>>>>>>>>>>>>>> pain
> > >>>>>>>>>>>>>>>>>>>>>>>>>> to use. They are a constant source of user
> > >>>>>>>>>>>>> confusion
> > >>>>>>>>>>>>>>> and
> > >>>>>>>>>>>>>>>>>>>> breakages
> > >>>>>>>>>>>>>>>>>>>>>>>>> during
> > >>>>>>>>>>>>>>>>>>>>>>>>>> upgrades. Would love to see them gone :).
> > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 11:11 AM James
> > >>>>>>>>>> Coder <
> > >>>>>>>>>>>>>>>>>>>> jcoder01@gmail.com>
> > >>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> I'm not sure I totally agree it's just a
> > >>>>>>>>>> UI
> > >>>>>>>>>>>>>>> concept. I
> > >>>>>>>>>>>>>>>>> use
> > >>>>>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>>>>> subdag
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> operator to simplify dependencies too. If
> > >>>>>>>>>>> you
> > >>>>>>>>>>>>>> have a
> > >>>>>>>>>>>>>>>>> group
> > >>>>>>>>>>>>>>>>>>> of
> > >>>>>>>>>>>>>>>>>>>>>> tasks
> > >>>>>>>>>>>>>>>>>>>>>>>>> that
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> need to finish before another group of
> > >>>>>>>>>> tasks
> > >>>>>>>>>>>>>> start,
> > >>>>>>>>>>>>>>>>> using
> > >>>>>>>>>>>>>>>>>> a
> > >>>>>>>>>>>>>>>>>>>>> subdag
> > >>>>>>>>>>>>>>>>>>>>>>> is
> > >>>>>>>>>>>>>>>>>>>>>>>>> a
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> pretty quick way to set those dependencies
> > >>>>>>>>>>>> and I
> > >>>>>>>>>>>>>>> think
> > >>>>>>>>>>>>>>>>>> also
> > >>>>>>>>>>>>>>>>>>>> make
> > >>>>>>>>>>>>>>>>>>>>>> it
> > >>>>>>>>>>>>>>>>>>>>>>>>>> easier
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> to follow the dag code.
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 9:53 AM Kyle
> > >>>>>>>>>> Hamlin
> > >>>>>>>>>>> <
> > >>>>>>>>>>>>>>>>>>>>> hamlin.kn@gmail.com>
> > >>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> I second Ash’s grouping concept.
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 5:10 AM Ash
> > >>>>>>>>>>>>>> Berlin-Taylor
> > >>>>>>>>>>>>>>> <
> > >>>>>>>>>>>>>>>>>>>>>> ash@apache.org
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Question:
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Do we even need the SubDagOperator
> > >>>>>>>>>>>> anymore?
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Would removing it entirely and just
> > >>>>>>>>>>>>> replacing
> > >>>>>>>>>>>>>> it
> > >>>>>>>>>>>>>>>>> with
> > >>>>>>>>>>>>>>>>>> a
> > >>>>>>>>>>>>>>>>>>> UI
> > >>>>>>>>>>>>>>>>>>>>>>>>> grouping
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> concept be conceptually simpler, less
> > >>>>>>>>>> to
> > >>>>>>>>>>>> get
> > >>>>>>>>>>>>>>>> wrong,
> > >>>>>>>>>>>>>>>>>> and
> > >>>>>>>>>>>>>>>>>>>>> closer
> > >>>>>>>>>>>>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>>>>>>>>>>>> what
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> users actually want to achieve with
> > >>>>>>>>>>>> subdags?
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> With your proposed change, tasks in
> > >>>>>>>>>>>> subdags
> > >>>>>>>>>>>>>>> could
> > >>>>>>>>>>>>>>>>>> start
> > >>>>>>>>>>>>>>>>>>>>>> running
> > >>>>>>>>>>>>>>>>>>>>>>> in
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> parallel (a good change) -- so should
> > >>>>>>>>>> we
> > >>>>>>>>>>>> not
> > >>>>>>>>>>>>>>> also
> > >>>>>>>>>>>>>>>>> just
> > >>>>>>>>>>>>>>>>>>>>>>> _enitrely_
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> remove
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the concept of a sub dag and replace
> > >>>>>>>>>> it
> > >>>>>>>>>>>> with
> > >>>>>>>>>>>>>>>>> something
> > >>>>>>>>>>>>>>>>>>>>>> simpler.
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Problems with subdags (I think. I
> > >>>>>>>>>>> haven't
> > >>>>>>>>>>>>> used
> > >>>>>>>>>>>>>>>> them
> > >>>>>>>>>>>>>>>>>>>>>> extensively
> > >>>>>>>>>>>>>>>>>>>>>>> so
> > >>>>>>>>>>>>>>>>>>>>>>>>>> may
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> be wrong on some of these):
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> - They need their own dag_id, but it
> > >>>>>>>>>>>> has(?)
> > >>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>> be
> > >>>>>>>>>>>>>>>> of
> > >>>>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>>> form
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> `parent_dag_id.subdag_id`.
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> - They need their own
> > >>>>>>>>>> schedule_interval,
> > >>>>>>>>>>>> but
> > >>>>>>>>>>>>>> it
> > >>>>>>>>>>>>>>>> has
> > >>>>>>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>>>>>> match
> > >>>>>>>>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>>>>>>>> parent
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> dag
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Sub dags can be paused on their own.
> > >>>>>>>>>>>> (Does
> > >>>>>>>>>>>>>> it
> > >>>>>>>>>>>>>>>> make
> > >>>>>>>>>>>>>>>>>>> sense
> > >>>>>>>>>>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>>>>>>>> do
> > >>>>>>>>>>>>>>>>>>>>>>>>>> this?
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Pausing just a sub dag would mean the
> > >>>>>>>>>>> sub
> > >>>>>>>>>>>>> dag
> > >>>>>>>>>>>>>>>> would
> > >>>>>>>>>>>>>>>>>>> never
> > >>>>>>>>>>>>>>>>>>>>>>>>> execute, so
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the SubDagOperator would fail too.
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> - You had to choose the executor to
> > >>>>>>>>>>>>> operator a
> > >>>>>>>>>>>>>>>>> subdag
> > >>>>>>>>>>>>>>>>>>> with
> > >>>>>>>>>>>>>>>>>>>>> --
> > >>>>>>>>>>>>>>>>>>>>>>>>> always
> > >>>>>>>>>>>>>>>>>>>>>>>>>> a
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> bit of a kludge.
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thoughts?
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> -ash
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Jun 12 2020, at 12:01 pm, Ash
> > >>>>>>>>>>>>>> Berlin-Taylor <
> > >>>>>>>>>>>>>>>>>>>>>> ash@apache.org>
> > >>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Workon sub-dags is much needed, I'm
> > >>>>>>>>>>>>> excited
> > >>>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>> see
> > >>>>>>>>>>>>>>>>>> how
> > >>>>>>>>>>>>>>>>>>>>> this
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> progresses.
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - *Unpack SubDags during dag
> > >>>>>>>>>>> parsing*:
> > >>>>>>>>>>>>> This
> > >>>>>>>>>>>>>>>>>> rewrites
> > >>>>>>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> *DagBag.bag_dag*
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> method to unpack subdag while
> > >>>>>>>>>>> parsing,
> > >>>>>>>>>>>>> and
> > >>>>>>>>>>>>>> it
> > >>>>>>>>>>>>>>>>> will
> > >>>>>>>>>>>>>>>>>>>> give a
> > >>>>>>>>>>>>>>>>>>>>>>> flat
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> structure at
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the task level
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The serialized_dag representation
> > >>>>>>>>>>>> already
> > >>>>>>>>>>>>>> does
> > >>>>>>>>>>>>>>>>> this
> > >>>>>>>>>>>>>>>>>> I
> > >>>>>>>>>>>>>>>>>>>>> think.
> > >>>>>>>>>>>>>>>>>>>>>>> At
> > >>>>>>>>>>>>>>>>>>>>>>>>>> least
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> if
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I've understood your idea here
> > >>>>>>>>>>>> correctly.
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> -ash
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Jun 12 2020, at 9:51 am, Xinbin
> > >>>>>>>>>>>> Huang <
> > >>>>>>>>>>>>>>>>>>>>>>> bin.huangxb@gmail.com
> > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi everyone,
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sending a message to everyone and
> > >>>>>>>>>>>> collect
> > >>>>>>>>>>>>>>>>> feedback
> > >>>>>>>>>>>>>>>>>> on
> > >>>>>>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>>>>>>> AIP-34
> > >>>>>>>>>>>>>>>>>>>>>>>>>> on
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> rewriting SubDagOperator. This was
> > >>>>>>>>>>>>>> previously
> > >>>>>>>>>>>>>>>>>> briefly
> > >>>>>>>>>>>>>>>>>>>>>>>>> mentioned in
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> discussion about what needs to be
> > >>>>>>>>>>> done
> > >>>>>>>>>>>>> for
> > >>>>>>>>>>>>>>>>> Airflow
> > >>>>>>>>>>>>>>>>>>> 2.0,
> > >>>>>>>>>>>>>>>>>>>>> and
> > >>>>>>>>>>>>>>>>>>>>>>>>> one of
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ideas is to make SubDagOperator
> > >>>>>>>>>>> attach
> > >>>>>>>>>>>>>> tasks
> > >>>>>>>>>>>>>>>> back
> > >>>>>>>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>>>> root
> > >>>>>>>>>>>>>>>>>>>>>>>>> DAG.
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This AIP-34 focuses on solving
> > >>>>>>>>>>>>>> SubDagOperator
> > >>>>>>>>>>>>>>>>>> related
> > >>>>>>>>>>>>>>>>>>>>>> issues
> > >>>>>>>>>>>>>>>>>>>>>>> by
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> reattaching
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> all tasks back to the root dag
> > >>>>>>>>>> while
> > >>>>>>>>>>>>>>> respecting
> > >>>>>>>>>>>>>>>>>>>>>> dependencies
> > >>>>>>>>>>>>>>>>>>>>>>>>>> during
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> parsing. The original grouping
> > >>>>>>>>>> effect
> > >>>>>>>>>>>> on
> > >>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>> UI
> > >>>>>>>>>>>>>>>>>> will
> > >>>>>>>>>>>>>>>>>>> be
> > >>>>>>>>>>>>>>>>>>>>>>>>> achieved
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> through
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> grouping related tasks by metadata.
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This also makes the dag_factory
> > >>>>>>>>>>>> function
> > >>>>>>>>>>>>>> more
> > >>>>>>>>>>>>>>>>>>> reusable
> > >>>>>>>>>>>>>>>>>>>>>>> because
> > >>>>>>>>>>>>>>>>>>>>>>>>> you
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> don't
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> need to have parent_dag_name and
> > >>>>>>>>>>>>>>> child_dag_name
> > >>>>>>>>>>>>>>>>> in
> > >>>>>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>>>>> function
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> signature
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> anymore.
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Changes proposed:
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - *Unpack SubDags during dag
> > >>>>>>>>>>> parsing*:
> > >>>>>>>>>>>>> This
> > >>>>>>>>>>>>>>>>>> rewrites
> > >>>>>>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> *DagBag.bag_dag*
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> method to unpack subdag while
> > >>>>>>>>>>> parsing,
> > >>>>>>>>>>>>> and
> > >>>>>>>>>>>>>> it
> > >>>>>>>>>>>>>>>>> will
> > >>>>>>>>>>>>>>>>>>>> give a
> > >>>>>>>>>>>>>>>>>>>>>>> flat
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> structure at
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the task level
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - *Simplify SubDagOperator*: The
> > >>>>>>>>>> new
> > >>>>>>>>>>>>>>>>> SubDagOperator
> > >>>>>>>>>>>>>>>>>>>> acts
> > >>>>>>>>>>>>>>>>>>>>>>> like a
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> container and most of the original
> > >>>>>>>>>>>>> methods
> > >>>>>>>>>>>>>>> are
> > >>>>>>>>>>>>>>>>>>> removed.
> > >>>>>>>>>>>>>>>>>>>>> The
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> signature is
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> also changed to *subdag_factory
> > >>>>>>>>>> *with
> > >>>>>>>>>>>>>>>>> *subdag_args
> > >>>>>>>>>>>>>>>>>>> *and
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> *subdag_kwargs*.
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This is similar to the
> > >>>>>>>>>> PythonOperator
> > >>>>>>>>>>>>>>>> signature.
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - *Add a TaskGroup model and add
> > >>>>>>>>>>>>>>> current_group
> > >>>>>>>>>>>>>>>> &
> > >>>>>>>>>>>>>>>>>>>>>> parent_group
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> attributes
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to BaseOperator*: This metadata is
> > >>>>>>>>>>> used
> > >>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>> group
> > >>>>>>>>>>>>>>>>>>> tasks
> > >>>>>>>>>>>>>>>>>>>>> for
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> rendering at
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> UI level. It may potentially extend
> > >>>>>>>>>>>>> further
> > >>>>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>>>> group
> > >>>>>>>>>>>>>>>>>>>>>>> arbitrary
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> tasks
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> outside the context of subdag to
> > >>>>>>>>>>> allow
> > >>>>>>>>>>>>>>>>> group-level
> > >>>>>>>>>>>>>>>>>>>>>> operations
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> (i.e.
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> stop/trigger a group of task within
> > >>>>>>>>>>> the
> > >>>>>>>>>>>>>> dag)
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - *Webserver UI for SubDag*:
> > >>>>>>>>>> Proposed
> > >>>>>>>>>>>> UI
> > >>>>>>>>>>>>>>>>>> modification
> > >>>>>>>>>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>>>>>>>>> allow
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (un)collapse a group of tasks for a
> > >>>>>>>>>>>> flat
> > >>>>>>>>>>>>>>>>> structure
> > >>>>>>>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>>>>>>> pair
> > >>>>>>>>>>>>>>>>>>>>>>> with
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> first
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> change instead of the original
> > >>>>>>>>>>>>> hierarchical
> > >>>>>>>>>>>>>>>>>>> structure.
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Please see related documents and
> > >>>>>>>>>> PRs
> > >>>>>>>>>>>> for
> > >>>>>>>>>>>>>>>> details:
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> AIP:
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-34+Rewrite+SubDagOperator
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Original Issue:
> > >>>>>>>>>>>>>>>>>>>>>>> https://github.com/apache/airflow/issues/8078
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Draft PR:
> > >>>>>>>>>>>>>>>>>>> https://github.com/apache/airflow/pull/9243
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Please let me know if there are any
> > >>>>>>>>>>>>> aspects
> > >>>>>>>>>>>>>>>> that
> > >>>>>>>>>>>>>>>>>> you
> > >>>>>>>>>>>>>>>>>>>>>>>>>> agree/disagree
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with or
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> need more clarification (especially
> > >>>>>>>>>>> the
> > >>>>>>>>>>>>>> third
> > >>>>>>>>>>>>>>>>>> change
> > >>>>>>>>>>>>>>>>>>>>>>> regarding
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> TaskGroup).
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Any comments are welcome and I am
> > >>>>>>>>>>>> looking
> > >>>>>>>>>>>>>>>> forward
> > >>>>>>>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>>>>>> it!
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Cheers
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bin
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> --
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> Kyle Hamlin
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>> --
> > >>>>>>>>>>>>>>>>>>>>>>> Thanks & Regards
> > >>>>>>>>>>>>>>>>>>>>>>> Poornima
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> --
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> Jarek Potiuk
> > >>>>>>>>>>>>>>> Polidea <https://www.polidea.com/> | Principal
> > >> Software
> > >>>>>> Engineer
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> M: +48 660 796 129 <+48%20660%20796%20129>
> > >> <+48660796129
> > >>>>>>>>>>>>> <+48%20660%20796%20129>>
> > >>>>>>>>>>>>>>> [image: Polidea] <https://www.polidea.com/>
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> --
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Jarek Potiuk
> > >>>>>>>>>>>>> Polidea <https://www.polidea.com/> | Principal Software
> > >>>>> Engineer
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> M: +48 660 796 129 <+48%20660%20796%20129> <+48660796129
> > >>>>>>>>>>>>> <+48%20660%20796%20129>>
> > >>>>>>>>>>>>> [image: Polidea] <https://www.polidea.com/>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> --
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> *Jacob Ferriero*
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Strategic Cloud Engineer: Data Engineering
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> jferriero@google.com
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> 617-714-2509
> > >>>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message