falcon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ajay Yadav <ajayn...@gmail.com>
Subject Re: [DISCUSS] Orchestration in Falcon
Date Thu, 15 Jan 2015 10:05:48 GMT
+1 for confluence page. It will serve as a design documentation as well
along with discussion.

On Thu, Jan 15, 2015 at 2:55 PM, Srikanth Sundarrajan <sriksun@hotmail.com>
wrote:

> -dev@f.i.a.o
>
> It looks like we have broad consensus on this, should we open up a discuss
> thread on how we go about this ? Or should we create a confluence page and
> collaborate through that ?
>
> Regards
> Srikanth Sundarrajan
>
> > From: psychidris@gmail.com
> > Date: Thu, 1 Jan 2015 22:40:48 +0530
> > Subject: Re: [DISCUSS] Orchestration in Falcon
> > To: dev@falcon.incubator.apache.org
> >
> > +1.
> >
> > Few more relevant asks:
> > 1. Support for "Last Only" option for process scheduling (In addition to
> >  LIFO/FIFO), currently oozie has some issues.
> > 2. Support for Singleton process (lock based), the behaviour of all
> > instances of process is same.
> >
> > Thanks,
> > -Idris
> >
> >
> > On Thu, Jan 1, 2015 at 7:51 PM, Jean-Baptiste Onofré <jb@nanthrax.net>
> > wrote:
> >
> > > +1
> > >
> > > Regards
> > > JB
> > >
> > >
> > > On 12/31/2014 03:53 PM, Srikanth Sundarrajan wrote:
> > >
> > >> Can we pick up this thread in the new year when folks are back from
> > >> break? I am in total agreement with Venkatesh here. We ought to have
> a long
> > >> term sustainable approach. Also I feel that the capabilities that we
> would
> > >> like to enable on falcon and getting them done through oozie in near
> term
> > >> seems to be a tall ask anyways.
> > >>
> > >> Regards
> > >> Srikanth Sundarrajan
> > >>
> > >>  Date: Tue, 23 Dec 2014 16:44:06 -0800
> > >>> Subject: Re: [DISCUSS] Orchestration in Falcon
> > >>> From: venkatesh@innerzeal.com
> > >>> To: dev@falcon.incubator.apache.org
> > >>>
> > >>> Chugging along with Oozie is bad for Falcon in the long run, for
> users
> > >>> and
> > >>> developers. Its horribly complex to work through the many rough edges
> > >>> architecturally in Oozie. Look at all the patches for security that
> I had
> > >>> to fix around Oozie. Its unnecessarily very complex, non-uniform and
> is
> > >>> NOT
> > >>> meant to be used by another tool like Falcon but was built around end
> > >>> user.
> > >>>
> > >>> This is a good discussion to have - may be explore oozie for
> short-term
> > >>> but
> > >>> look at alternative solutions for the long-term.
> > >>>
> > >>> On Tue, Dec 23, 2014 at 7:28 AM, Srikanth Sundarrajan <
> > >>> sriksun@hotmail.com>
> > >>> wrote:
> > >>>
> > >>>  @jb, There is no doubt merit in mapping them to oozie if possible
> and if
> > >>>> extensions are simple and straight forward enough.
> > >>>>
> > >>>> Also had a quick chat offline with Shwetha and she mentioned about
> some
> > >>>> work happening in Oozie in this regard. On further digging up,
found
> > >>>> https://issues.apache.org/jira/browse/OOZIE-1976. This is possibly
> what
> > >>>> Shwetha was referring to. From the looks of it, this tries to
> address
> > >>>> item
> > >>>> #7 in the original thread.  May be there are more jiras where
> additional
> > >>>> work such as a-periodic datasets is being worked on. Perhaps
> @Shwetha
> > >>>> can
> > >>>> throw some light on what is being considered and/or how these
> > >>>> gating/orchestration use cases can be managed.
> > >>>>
> > >>>> Regards
> > >>>> Srikanth Sundarrajan
> > >>>>
> > >>>>  Date: Tue, 23 Dec 2014 11:06:24 +0100
> > >>>>> From: jb@nanthrax.net
> > >>>>> To: dev@falcon.incubator.apache.org
> > >>>>> Subject: Re: [DISCUSS] Orchestration in Falcon
> > >>>>>
> > >>>>> Hi all,
> > >>>>>
> > >>>>> I second Shwetha there. I think we can achieve such features
in
> Oozie
> > >>>>> (with some adaptations).
> > >>>>>
> > >>>>> Regards
> > >>>>> JB
> > >>>>>
> > >>>>> Le 2014-12-23 10:53, Shwetha G S a écrit :
> > >>>>>
> > >>>>>> If we can get rid of oozie entirely, yes we can explore
other
> > >>>>>> possibilities. But if we are still going to use oozie for
DAG
> > >>>>>> execution, we
> > >>>>>> are going to add add another bottleneck in the whole
> > >>>>>> execution(currently,
> > >>>>>> falcon is not in the workflow execution path) and I don't
think
> its
> > >>>>>> worth
> > >>>>>> it.
> > >>>>>>
> > >>>>>> The features that are outlined above are all available
in basic
> forms
> > >>>>>> in
> > >>>>>> oozie and it should be easy to enhance them/make them as
extension
> > >>>>>> points.
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> -Shwetha
> > >>>>>>
> > >>>>>> On Tue, Dec 23, 2014 at 8:12 AM, Srikanth Sundarrajan
> > >>>>>> <sriksun@hotmail.com>
> > >>>>>> wrote:
> > >>>>>>
> > >>>>>>  Here are few more gaps that we ought to solve for while
we are
> on the
> > >>>>>>> subject:
> > >>>>>>>
> > >>>>>>> 1. Ability to attach to start & finish events of
workflow
> execution.
> > >>>>>>> Currently we have post processing hook to listen to
finish
> events,
> > >>>>>>> but
> > >>>>>>> we
> > >>>>>>> do run into scenarios where there are occasional failures
with
> > >>>>>>> post-processing and there is potential phase lag in
learning
> about
> > >>>>>>> the
> > >>>>>>> events.
> > >>>>>>> 2. Strict enforcement of concurrency control possibly
spanning
> > >>>>>>> process
> > >>>>>>> boundaries.
> > >>>>>>> 3. Ability to tune how backlogs have to be caught up
(old
> instances
> > >>>>>>> to
> > >>>>>>> be
> > >>>>>>> given higher priority, newer instances to be given
higher
> priority,
> > >>>>>>> or
> > >>>>>>> some
> > >>>>>>> sort of weights to allow both to make progress at varying
rates).
> > >>>>>>> There
> > >>>>>>> have been asks for routing current vs older instances
to
> different
> > >>>>>>> queues
> > >>>>>>> by users as an alternative.
> > >>>>>>> 4. Ability to have a notion of non-time based feed
instances and
> > >>>>>>> related
> > >>>>>>> coordination.
> > >>>>>>> 5. Currently keeping track of and managing SLAs is
also a
> challenge,
> > >>>>>>> but
> > >>>>>>> with #1 addressed, this might be a lesser concern.
> > >>>>>>>
> > >>>>>>> Regards
> > >>>>>>> Srikanth Sundarrajan
> > >>>>>>>
> > >>>>>>>  Subject: Re: [DISCUSS] Orchestration in Falcon
> > >>>>>>>> From: sriksun@hotmail.com
> > >>>>>>>> Date: Tue, 23 Dec 2014 06:30:30 +0530
> > >>>>>>>> To: dev@falcon.incubator.apache.org
> > >>>>>>>>
> > >>>>>>>> @venkatesh, the question really is how do we enable
these
> gating pre
> > >>>>>>>>
> > >>>>>>> conditions. Seems hard enough to add them to oozie,
but am not
> > >>>>>>> intimately
> > >>>>>>> familiar with oozie to comment on how hard or easy
it is. Like I
> > >>>>>>> responded
> > >>>>>>> to @ajay on the same thread, if we are to do away with
> coordination
> > >>>>>>> through
> > >>>>>>> oozie, we can follow up this discussion with approaches
and
> design.
> > >>>>>>> Though
> > >>>>>>> I had quartz in my mind, wanted to leave that out of
discussion
> to
> > >>>>>>> see
> > >>>>>>> if
> > >>>>>>> there is consensus for moving away from oozie coords
and
> implementing
> > >>>>>>> them
> > >>>>>>> through other means.
> > >>>>>>>
> > >>>>>>>>
> > >>>>>>>> Sent from my iPhone
> > >>>>>>>>
> > >>>>>>>>  On 23-Dec-2014, at 1:16 am, "Seetharam Venkatesh"
<
> > >>>>>>>>>
> > >>>>>>>> venkatesh@innerzeal.com> wrote:
> > >>>>>>>
> > >>>>>>>>
> > >>>>>>>>> What is the purpose of this decoupling? Why
build this into
> > >>>>>>>>>
> > >>>>>>>> Falcon?
> > >>>>
> > >>>>> Scheduling is so common that there are dime a dozen schedulers
> > >>>>>>>>>
> > >>>>>>>> today
> > >>>>
> > >>>>> and
> > >>>>>>>
> > >>>>>>>> they are all extensible with custom triggers. Making
it part of
> > >>>>>>>>>
> > >>>>>>>> Falcon
> > >>>>
> > >>>>> will
> > >>>>>>>
> > >>>>>>>> suffer the same issues that Oozie has today.
> > >>>>>>>>>
> > >>>>>>>>> I'm sorry but I'm a HUGE -1 to this being built
into Falcon
> > >>>>>>>>>
> > >>>>>>>> codebase.
> > >>>>
> > >>>>>
> > >>>>>>>>> However, I'm +1 to reusing Quartz scheduler
that already
> exists -
> > >>>>>>>>>
> > >>>>>>>> stand it
> > >>>>>>>
> > >>>>>>>> up outside or embed it like we do for active MQ.
> > >>>>>>>>>
> > >>>>>>>>> Phase 2 - I'd like to see we write a simple
DAG execution
> layer in
> > >>>>>>>>>
> > >>>>>>>> YARN as
> > >>>>>>>
> > >>>>>>>> an app master with out DB and keeps state on HDFS
as an
> alternate
> > >>>>>>>>>
> > >>>>>>>> to
> > >>>>
> > >>>>> Oozie.
> > >>>>>>>
> > >>>>>>>>
> > >>>>>>>>> Then we will have a nimble falcon which can
kick ass.
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> On Sun, Dec 21, 2014 at 6:13 AM, Srikanth Sundarrajan
<
> > >>>>>>>>>
> > >>>>>>>> sriksun@hotmail.com>
> > >>>>>>>
> > >>>>>>>> wrote:
> > >>>>>>>>>
> > >>>>>>>>>  Hello Team,
> > >>>>>>>>>>
> > >>>>>>>>>> Since its inception Falcon has used Oozie
for process
> > >>>>>>>>>>
> > >>>>>>>>> orchestration as
> > >>>>
> > >>>>> well as feed life cycle phase executions, while this has worked
> > >>>>>>>>>>
> > >>>>>>>>> reasonably
> > >>>>>>>
> > >>>>>>>> and allowed to make higher level capabilities available
through
> > >>>>>>>>>>
> > >>>>>>>>> Falcon, we
> > >>>>>>>
> > >>>>>>>> are increasing seeing scenarios where this is proving
to be a
> > >>>>>>>>>>
> > >>>>>>>>> limiting
> > >>>>
> > >>>>> factor. In its current form, Falcon relies on Oozie for both
> > >>>>>>>>>>
> > >>>>>>>>> scheduling and
> > >>>>>>>
> > >>>>>>>> for workflow execution, due to which the scheduling
is limited
> > >>>>>>>>>>
> > >>>>>>>>> to time
> > >>>>
> > >>>>> based/cron based scheduling with additional gating conditions
on
> > >>>>>>>>>>
> > >>>>>>>>> data
> > >>>>
> > >>>>> availability. Also this imposes restrictions on datesets being
> > >>>>>>>>>> periodic/cyclic in nature.
> > >>>>>>>>>>
> > >>>>>>>>>>  From an orchestration stand point, it
would help if we can
> > >>>>>>>>>>
> > >>>>>>>>> support
> > >>>>
> > >>>>> standard gating / scheduling primitives via Falcon:
> > >>>>>>>>>>
> > >>>>>>>>>> 1. Simple periodic scheduling with no gating
conditions
> > >>>>>>>>>> 2. Cron based scheduling (day of week,
day of the month,
> specific
> > >>>>>>>>>>
> > >>>>>>>>> hours
> > >>>>>>>
> > >>>>>>>> and non-periodic) with no gating conditions
> > >>>>>>>>>> 3. Availability of new data (assuming monotonically
increasing
> > >>>>>>>>>>
> > >>>>>>>>> data
> > >>>>
> > >>>>> version, availavility of new versions)
> > >>>>>>>>>> 4. Changes to existing data (reinstatement
- similar to late
> data
> > >>>>>>>>>>
> > >>>>>>>>> handling)
> > >>>>>>>
> > >>>>>>>> 5. External trigger/notifications
> > >>>>>>>>>> 6. Availability of specific instances of
data as declared as
> > >>>>>>>>>>
> > >>>>>>>>> mandatory
> > >>>>
> > >>>>> dependency
> > >>>>>>>>>> 7. Availability of a minimum subset of
instances of data
> > >>>>>>>>>>
> > >>>>>>>>> declared as
> > >>>>
> > >>>>> mandatory depedency (at least 10 hourly instances of a day
with
> > >>>>>>>>>>
> > >>>>>>>>> 24
> > >>>>
> > >>>>> instances for ex)
> > >>>>>>>>>> 8. Valid combinations of the above.
> > >>>>>>>>>>
> > >>>>>>>>>> In this context, I would like to propose
that we move away
> from
> > >>>>>>>>>>
> > >>>>>>>>> Oozie
> > >>>>
> > >>>>> for
> > >>>>>>>
> > >>>>>>>> the orchestration requirements and have them implemented
> natively
> > >>>>>>>>>>
> > >>>>>>>>> within
> > >>>>>>>
> > >>>>>>>> Falcon. It will no doubt make Falcon server bulkier
and heavier
> > >>>>>>>>>>
> > >>>>>>>>> in
> > >>>>
> > >>>>> both
> > >>>>>>>
> > >>>>>>>> code and deployment, but seems like without it,
the
> orchestration
> > >>>>>>>>>>
> > >>>>>>>>> within
> > >>>>>>>
> > >>>>>>>> Falcon will be limited by capabilities available
within Oozie.
> > >>>>>>>>>>
> > >>>>>>>>>> Please do note that this suggestion is
restricted to the
> > >>>>>>>>>>
> > >>>>>>>>> scheduling
> > >>>>
> > >>>>> and
> > >>>>>>>
> > >>>>>>>> not to the workflow execution.
> > >>>>>>>>>>
> > >>>>>>>>>> Would like to hear from fellow developers
and users on what
> your
> > >>>>>>>>>>
> > >>>>>>>>> thoughts
> > >>>>>>>
> > >>>>>>>> are. Please do chime in with your views.
> > >>>>>>>>>>
> > >>>>>>>>>> Regards
> > >>>>>>>>>> Srikanth Sundarrajan
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> --
> > >>>>>>>>> Regards,
> > >>>>>>>>> Venkatesh
> > >>>>>>>>>
> > >>>>>>>>> “Perfection (in design) is achieved not when
there is nothing
> > >>>>>>>>>
> > >>>>>>>> more to
> > >>>>
> > >>>>> add,
> > >>>>>>>
> > >>>>>>>> but rather when there is nothing more to take away.”
> > >>>>>>>>> - Antoine de Saint-Exupéry
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>
> > >>>>
> > >>>
> > >>>
> > >>> --
> > >>> Regards,
> > >>> Venkatesh
> > >>>
> > >>> “Perfection (in design) is achieved not when there is nothing more
to
> > >>> add,
> > >>> but rather when there is nothing more to take away.”
> > >>> - Antoine de Saint-Exupéry
> > >>>
> > >>
> > >>
> > >>
> > > --
> > > Jean-Baptiste Onofré
> > > jbonofre@apache.org
> > > http://blog.nanthrax.net
> > > Talend - http://www.talend.com
> > >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message