falcon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Srikanth Sundarrajan <srik...@hotmail.com>
Subject RE: [DISCUSS] Orchestration in Falcon
Date Thu, 15 Jan 2015 09:25:04 GMT
-dev@f.i.a.o

It looks like we have broad consensus on this, should we open up a discuss thread on how we
go about this ? Or should we create a confluence page and collaborate through that ?

Regards
Srikanth Sundarrajan

> From: psychidris@gmail.com
> Date: Thu, 1 Jan 2015 22:40:48 +0530
> Subject: Re: [DISCUSS] Orchestration in Falcon
> To: dev@falcon.incubator.apache.org
> 
> +1.
> 
> Few more relevant asks:
> 1. Support for "Last Only" option for process scheduling (In addition to
>  LIFO/FIFO), currently oozie has some issues.
> 2. Support for Singleton process (lock based), the behaviour of all
> instances of process is same.
> 
> Thanks,
> -Idris
> 
> 
> On Thu, Jan 1, 2015 at 7:51 PM, Jean-Baptiste Onofré <jb@nanthrax.net>
> wrote:
> 
> > +1
> >
> > Regards
> > JB
> >
> >
> > On 12/31/2014 03:53 PM, Srikanth Sundarrajan wrote:
> >
> >> Can we pick up this thread in the new year when folks are back from
> >> break? I am in total agreement with Venkatesh here. We ought to have a long
> >> term sustainable approach. Also I feel that the capabilities that we would
> >> like to enable on falcon and getting them done through oozie in near term
> >> seems to be a tall ask anyways.
> >>
> >> Regards
> >> Srikanth Sundarrajan
> >>
> >>  Date: Tue, 23 Dec 2014 16:44:06 -0800
> >>> Subject: Re: [DISCUSS] Orchestration in Falcon
> >>> From: venkatesh@innerzeal.com
> >>> To: dev@falcon.incubator.apache.org
> >>>
> >>> Chugging along with Oozie is bad for Falcon in the long run, for users
> >>> and
> >>> developers. Its horribly complex to work through the many rough edges
> >>> architecturally in Oozie. Look at all the patches for security that I had
> >>> to fix around Oozie. Its unnecessarily very complex, non-uniform and is
> >>> NOT
> >>> meant to be used by another tool like Falcon but was built around end
> >>> user.
> >>>
> >>> This is a good discussion to have - may be explore oozie for short-term
> >>> but
> >>> look at alternative solutions for the long-term.
> >>>
> >>> On Tue, Dec 23, 2014 at 7:28 AM, Srikanth Sundarrajan <
> >>> sriksun@hotmail.com>
> >>> wrote:
> >>>
> >>>  @jb, There is no doubt merit in mapping them to oozie if possible and if
> >>>> extensions are simple and straight forward enough.
> >>>>
> >>>> Also had a quick chat offline with Shwetha and she mentioned about some
> >>>> work happening in Oozie in this regard. On further digging up, found
> >>>> https://issues.apache.org/jira/browse/OOZIE-1976. This is possibly what
> >>>> Shwetha was referring to. From the looks of it, this tries to address
> >>>> item
> >>>> #7 in the original thread.  May be there are more jiras where additional
> >>>> work such as a-periodic datasets is being worked on. Perhaps @Shwetha
> >>>> can
> >>>> throw some light on what is being considered and/or how these
> >>>> gating/orchestration use cases can be managed.
> >>>>
> >>>> Regards
> >>>> Srikanth Sundarrajan
> >>>>
> >>>>  Date: Tue, 23 Dec 2014 11:06:24 +0100
> >>>>> From: jb@nanthrax.net
> >>>>> To: dev@falcon.incubator.apache.org
> >>>>> Subject: Re: [DISCUSS] Orchestration in Falcon
> >>>>>
> >>>>> Hi all,
> >>>>>
> >>>>> I second Shwetha there. I think we can achieve such features in
Oozie
> >>>>> (with some adaptations).
> >>>>>
> >>>>> Regards
> >>>>> JB
> >>>>>
> >>>>> Le 2014-12-23 10:53, Shwetha G S a écrit :
> >>>>>
> >>>>>> If we can get rid of oozie entirely, yes we can explore other
> >>>>>> possibilities. But if we are still going to use oozie for DAG
> >>>>>> execution, we
> >>>>>> are going to add add another bottleneck in the whole
> >>>>>> execution(currently,
> >>>>>> falcon is not in the workflow execution path) and I don't think
its
> >>>>>> worth
> >>>>>> it.
> >>>>>>
> >>>>>> The features that are outlined above are all available in basic
forms
> >>>>>> in
> >>>>>> oozie and it should be easy to enhance them/make them as extension
> >>>>>> points.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> -Shwetha
> >>>>>>
> >>>>>> On Tue, Dec 23, 2014 at 8:12 AM, Srikanth Sundarrajan
> >>>>>> <sriksun@hotmail.com>
> >>>>>> wrote:
> >>>>>>
> >>>>>>  Here are few more gaps that we ought to solve for while we
are on the
> >>>>>>> subject:
> >>>>>>>
> >>>>>>> 1. Ability to attach to start & finish events of workflow
execution.
> >>>>>>> Currently we have post processing hook to listen to finish
events,
> >>>>>>> but
> >>>>>>> we
> >>>>>>> do run into scenarios where there are occasional failures
with
> >>>>>>> post-processing and there is potential phase lag in learning
about
> >>>>>>> the
> >>>>>>> events.
> >>>>>>> 2. Strict enforcement of concurrency control possibly spanning
> >>>>>>> process
> >>>>>>> boundaries.
> >>>>>>> 3. Ability to tune how backlogs have to be caught up (old
instances
> >>>>>>> to
> >>>>>>> be
> >>>>>>> given higher priority, newer instances to be given higher
priority,
> >>>>>>> or
> >>>>>>> some
> >>>>>>> sort of weights to allow both to make progress at varying
rates).
> >>>>>>> There
> >>>>>>> have been asks for routing current vs older instances to
different
> >>>>>>> queues
> >>>>>>> by users as an alternative.
> >>>>>>> 4. Ability to have a notion of non-time based feed instances
and
> >>>>>>> related
> >>>>>>> coordination.
> >>>>>>> 5. Currently keeping track of and managing SLAs is also
a challenge,
> >>>>>>> but
> >>>>>>> with #1 addressed, this might be a lesser concern.
> >>>>>>>
> >>>>>>> Regards
> >>>>>>> Srikanth Sundarrajan
> >>>>>>>
> >>>>>>>  Subject: Re: [DISCUSS] Orchestration in Falcon
> >>>>>>>> From: sriksun@hotmail.com
> >>>>>>>> Date: Tue, 23 Dec 2014 06:30:30 +0530
> >>>>>>>> To: dev@falcon.incubator.apache.org
> >>>>>>>>
> >>>>>>>> @venkatesh, the question really is how do we enable
these gating pre
> >>>>>>>>
> >>>>>>> conditions. Seems hard enough to add them to oozie, but
am not
> >>>>>>> intimately
> >>>>>>> familiar with oozie to comment on how hard or easy it is.
Like I
> >>>>>>> responded
> >>>>>>> to @ajay on the same thread, if we are to do away with coordination
> >>>>>>> through
> >>>>>>> oozie, we can follow up this discussion with approaches
and design.
> >>>>>>> Though
> >>>>>>> I had quartz in my mind, wanted to leave that out of discussion
to
> >>>>>>> see
> >>>>>>> if
> >>>>>>> there is consensus for moving away from oozie coords and
implementing
> >>>>>>> them
> >>>>>>> through other means.
> >>>>>>>
> >>>>>>>>
> >>>>>>>> Sent from my iPhone
> >>>>>>>>
> >>>>>>>>  On 23-Dec-2014, at 1:16 am, "Seetharam Venkatesh" <
> >>>>>>>>>
> >>>>>>>> venkatesh@innerzeal.com> wrote:
> >>>>>>>
> >>>>>>>>
> >>>>>>>>> What is the purpose of this decoupling? Why build
this into
> >>>>>>>>>
> >>>>>>>> Falcon?
> >>>>
> >>>>> Scheduling is so common that there are dime a dozen schedulers
> >>>>>>>>>
> >>>>>>>> today
> >>>>
> >>>>> and
> >>>>>>>
> >>>>>>>> they are all extensible with custom triggers. Making
it part of
> >>>>>>>>>
> >>>>>>>> Falcon
> >>>>
> >>>>> will
> >>>>>>>
> >>>>>>>> suffer the same issues that Oozie has today.
> >>>>>>>>>
> >>>>>>>>> I'm sorry but I'm a HUGE -1 to this being built
into Falcon
> >>>>>>>>>
> >>>>>>>> codebase.
> >>>>
> >>>>>
> >>>>>>>>> However, I'm +1 to reusing Quartz scheduler that
already exists -
> >>>>>>>>>
> >>>>>>>> stand it
> >>>>>>>
> >>>>>>>> up outside or embed it like we do for active MQ.
> >>>>>>>>>
> >>>>>>>>> Phase 2 - I'd like to see we write a simple DAG
execution layer in
> >>>>>>>>>
> >>>>>>>> YARN as
> >>>>>>>
> >>>>>>>> an app master with out DB and keeps state on HDFS as
an alternate
> >>>>>>>>>
> >>>>>>>> to
> >>>>
> >>>>> Oozie.
> >>>>>>>
> >>>>>>>>
> >>>>>>>>> Then we will have a nimble falcon which can kick
ass.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Sun, Dec 21, 2014 at 6:13 AM, Srikanth Sundarrajan
<
> >>>>>>>>>
> >>>>>>>> sriksun@hotmail.com>
> >>>>>>>
> >>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>  Hello Team,
> >>>>>>>>>>
> >>>>>>>>>> Since its inception Falcon has used Oozie for
process
> >>>>>>>>>>
> >>>>>>>>> orchestration as
> >>>>
> >>>>> well as feed life cycle phase executions, while this has worked
> >>>>>>>>>>
> >>>>>>>>> reasonably
> >>>>>>>
> >>>>>>>> and allowed to make higher level capabilities available
through
> >>>>>>>>>>
> >>>>>>>>> Falcon, we
> >>>>>>>
> >>>>>>>> are increasing seeing scenarios where this is proving
to be a
> >>>>>>>>>>
> >>>>>>>>> limiting
> >>>>
> >>>>> factor. In its current form, Falcon relies on Oozie for both
> >>>>>>>>>>
> >>>>>>>>> scheduling and
> >>>>>>>
> >>>>>>>> for workflow execution, due to which the scheduling
is limited
> >>>>>>>>>>
> >>>>>>>>> to time
> >>>>
> >>>>> based/cron based scheduling with additional gating conditions on
> >>>>>>>>>>
> >>>>>>>>> data
> >>>>
> >>>>> availability. Also this imposes restrictions on datesets being
> >>>>>>>>>> periodic/cyclic in nature.
> >>>>>>>>>>
> >>>>>>>>>>  From an orchestration stand point, it would
help if we can
> >>>>>>>>>>
> >>>>>>>>> support
> >>>>
> >>>>> standard gating / scheduling primitives via Falcon:
> >>>>>>>>>>
> >>>>>>>>>> 1. Simple periodic scheduling with no gating
conditions
> >>>>>>>>>> 2. Cron based scheduling (day of week, day of
the month, specific
> >>>>>>>>>>
> >>>>>>>>> hours
> >>>>>>>
> >>>>>>>> and non-periodic) with no gating conditions
> >>>>>>>>>> 3. Availability of new data (assuming monotonically
increasing
> >>>>>>>>>>
> >>>>>>>>> data
> >>>>
> >>>>> version, availavility of new versions)
> >>>>>>>>>> 4. Changes to existing data (reinstatement -
similar to late data
> >>>>>>>>>>
> >>>>>>>>> handling)
> >>>>>>>
> >>>>>>>> 5. External trigger/notifications
> >>>>>>>>>> 6. Availability of specific instances of data
as declared as
> >>>>>>>>>>
> >>>>>>>>> mandatory
> >>>>
> >>>>> dependency
> >>>>>>>>>> 7. Availability of a minimum subset of instances
of data
> >>>>>>>>>>
> >>>>>>>>> declared as
> >>>>
> >>>>> mandatory depedency (at least 10 hourly instances of a day with
> >>>>>>>>>>
> >>>>>>>>> 24
> >>>>
> >>>>> instances for ex)
> >>>>>>>>>> 8. Valid combinations of the above.
> >>>>>>>>>>
> >>>>>>>>>> In this context, I would like to propose that
we move away from
> >>>>>>>>>>
> >>>>>>>>> Oozie
> >>>>
> >>>>> for
> >>>>>>>
> >>>>>>>> the orchestration requirements and have them implemented
natively
> >>>>>>>>>>
> >>>>>>>>> within
> >>>>>>>
> >>>>>>>> Falcon. It will no doubt make Falcon server bulkier
and heavier
> >>>>>>>>>>
> >>>>>>>>> in
> >>>>
> >>>>> both
> >>>>>>>
> >>>>>>>> code and deployment, but seems like without it, the
orchestration
> >>>>>>>>>>
> >>>>>>>>> within
> >>>>>>>
> >>>>>>>> Falcon will be limited by capabilities available within
Oozie.
> >>>>>>>>>>
> >>>>>>>>>> Please do note that this suggestion is restricted
to the
> >>>>>>>>>>
> >>>>>>>>> scheduling
> >>>>
> >>>>> and
> >>>>>>>
> >>>>>>>> not to the workflow execution.
> >>>>>>>>>>
> >>>>>>>>>> Would like to hear from fellow developers and
users on what your
> >>>>>>>>>>
> >>>>>>>>> thoughts
> >>>>>>>
> >>>>>>>> are. Please do chime in with your views.
> >>>>>>>>>>
> >>>>>>>>>> Regards
> >>>>>>>>>> Srikanth Sundarrajan
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> Regards,
> >>>>>>>>> Venkatesh
> >>>>>>>>>
> >>>>>>>>> “Perfection (in design) is achieved not when there
is nothing
> >>>>>>>>>
> >>>>>>>> more to
> >>>>
> >>>>> add,
> >>>>>>>
> >>>>>>>> but rather when there is nothing more to take away.”
> >>>>>>>>> - Antoine de Saint-Exupéry
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>
> >>>>
> >>>
> >>>
> >>> --
> >>> Regards,
> >>> Venkatesh
> >>>
> >>> “Perfection (in design) is achieved not when there is nothing more to
> >>> add,
> >>> but rather when there is nothing more to take away.”
> >>> - Antoine de Saint-Exupéry
> >>>
> >>
> >>
> >>
> > --
> > Jean-Baptiste Onofré
> > jbonofre@apache.org
> > http://blog.nanthrax.net
> > Talend - http://www.talend.com
> >
 		 	   		  
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message