falcon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Idris Ali <psychid...@gmail.com>
Subject Re: [DISCUSS] Orchestration in Falcon
Date Thu, 01 Jan 2015 17:10:48 GMT
+1.

Few more relevant asks:
1. Support for "Last Only" option for process scheduling (In addition to
 LIFO/FIFO), currently oozie has some issues.
2. Support for Singleton process (lock based), the behaviour of all
instances of process is same.

Thanks,
-Idris


On Thu, Jan 1, 2015 at 7:51 PM, Jean-Baptiste Onofré <jb@nanthrax.net>
wrote:

> +1
>
> Regards
> JB
>
>
> On 12/31/2014 03:53 PM, Srikanth Sundarrajan wrote:
>
>> Can we pick up this thread in the new year when folks are back from
>> break? I am in total agreement with Venkatesh here. We ought to have a long
>> term sustainable approach. Also I feel that the capabilities that we would
>> like to enable on falcon and getting them done through oozie in near term
>> seems to be a tall ask anyways.
>>
>> Regards
>> Srikanth Sundarrajan
>>
>>  Date: Tue, 23 Dec 2014 16:44:06 -0800
>>> Subject: Re: [DISCUSS] Orchestration in Falcon
>>> From: venkatesh@innerzeal.com
>>> To: dev@falcon.incubator.apache.org
>>>
>>> Chugging along with Oozie is bad for Falcon in the long run, for users
>>> and
>>> developers. Its horribly complex to work through the many rough edges
>>> architecturally in Oozie. Look at all the patches for security that I had
>>> to fix around Oozie. Its unnecessarily very complex, non-uniform and is
>>> NOT
>>> meant to be used by another tool like Falcon but was built around end
>>> user.
>>>
>>> This is a good discussion to have - may be explore oozie for short-term
>>> but
>>> look at alternative solutions for the long-term.
>>>
>>> On Tue, Dec 23, 2014 at 7:28 AM, Srikanth Sundarrajan <
>>> sriksun@hotmail.com>
>>> wrote:
>>>
>>>  @jb, There is no doubt merit in mapping them to oozie if possible and if
>>>> extensions are simple and straight forward enough.
>>>>
>>>> Also had a quick chat offline with Shwetha and she mentioned about some
>>>> work happening in Oozie in this regard. On further digging up, found
>>>> https://issues.apache.org/jira/browse/OOZIE-1976. This is possibly what
>>>> Shwetha was referring to. From the looks of it, this tries to address
>>>> item
>>>> #7 in the original thread.  May be there are more jiras where additional
>>>> work such as a-periodic datasets is being worked on. Perhaps @Shwetha
>>>> can
>>>> throw some light on what is being considered and/or how these
>>>> gating/orchestration use cases can be managed.
>>>>
>>>> Regards
>>>> Srikanth Sundarrajan
>>>>
>>>>  Date: Tue, 23 Dec 2014 11:06:24 +0100
>>>>> From: jb@nanthrax.net
>>>>> To: dev@falcon.incubator.apache.org
>>>>> Subject: Re: [DISCUSS] Orchestration in Falcon
>>>>>
>>>>> Hi all,
>>>>>
>>>>> I second Shwetha there. I think we can achieve such features in Oozie
>>>>> (with some adaptations).
>>>>>
>>>>> Regards
>>>>> JB
>>>>>
>>>>> Le 2014-12-23 10:53, Shwetha G S a écrit :
>>>>>
>>>>>> If we can get rid of oozie entirely, yes we can explore other
>>>>>> possibilities. But if we are still going to use oozie for DAG
>>>>>> execution, we
>>>>>> are going to add add another bottleneck in the whole
>>>>>> execution(currently,
>>>>>> falcon is not in the workflow execution path) and I don't think its
>>>>>> worth
>>>>>> it.
>>>>>>
>>>>>> The features that are outlined above are all available in basic forms
>>>>>> in
>>>>>> oozie and it should be easy to enhance them/make them as extension
>>>>>> points.
>>>>>>
>>>>>>
>>>>>>
>>>>>> -Shwetha
>>>>>>
>>>>>> On Tue, Dec 23, 2014 at 8:12 AM, Srikanth Sundarrajan
>>>>>> <sriksun@hotmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>  Here are few more gaps that we ought to solve for while we are on
the
>>>>>>> subject:
>>>>>>>
>>>>>>> 1. Ability to attach to start & finish events of workflow
execution.
>>>>>>> Currently we have post processing hook to listen to finish events,
>>>>>>> but
>>>>>>> we
>>>>>>> do run into scenarios where there are occasional failures with
>>>>>>> post-processing and there is potential phase lag in learning
about
>>>>>>> the
>>>>>>> events.
>>>>>>> 2. Strict enforcement of concurrency control possibly spanning
>>>>>>> process
>>>>>>> boundaries.
>>>>>>> 3. Ability to tune how backlogs have to be caught up (old instances
>>>>>>> to
>>>>>>> be
>>>>>>> given higher priority, newer instances to be given higher priority,
>>>>>>> or
>>>>>>> some
>>>>>>> sort of weights to allow both to make progress at varying rates).
>>>>>>> There
>>>>>>> have been asks for routing current vs older instances to different
>>>>>>> queues
>>>>>>> by users as an alternative.
>>>>>>> 4. Ability to have a notion of non-time based feed instances
and
>>>>>>> related
>>>>>>> coordination.
>>>>>>> 5. Currently keeping track of and managing SLAs is also a challenge,
>>>>>>> but
>>>>>>> with #1 addressed, this might be a lesser concern.
>>>>>>>
>>>>>>> Regards
>>>>>>> Srikanth Sundarrajan
>>>>>>>
>>>>>>>  Subject: Re: [DISCUSS] Orchestration in Falcon
>>>>>>>> From: sriksun@hotmail.com
>>>>>>>> Date: Tue, 23 Dec 2014 06:30:30 +0530
>>>>>>>> To: dev@falcon.incubator.apache.org
>>>>>>>>
>>>>>>>> @venkatesh, the question really is how do we enable these
gating pre
>>>>>>>>
>>>>>>> conditions. Seems hard enough to add them to oozie, but am not
>>>>>>> intimately
>>>>>>> familiar with oozie to comment on how hard or easy it is. Like
I
>>>>>>> responded
>>>>>>> to @ajay on the same thread, if we are to do away with coordination
>>>>>>> through
>>>>>>> oozie, we can follow up this discussion with approaches and design.
>>>>>>> Though
>>>>>>> I had quartz in my mind, wanted to leave that out of discussion
to
>>>>>>> see
>>>>>>> if
>>>>>>> there is consensus for moving away from oozie coords and implementing
>>>>>>> them
>>>>>>> through other means.
>>>>>>>
>>>>>>>>
>>>>>>>> Sent from my iPhone
>>>>>>>>
>>>>>>>>  On 23-Dec-2014, at 1:16 am, "Seetharam Venkatesh" <
>>>>>>>>>
>>>>>>>> venkatesh@innerzeal.com> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>> What is the purpose of this decoupling? Why build this
into
>>>>>>>>>
>>>>>>>> Falcon?
>>>>
>>>>> Scheduling is so common that there are dime a dozen schedulers
>>>>>>>>>
>>>>>>>> today
>>>>
>>>>> and
>>>>>>>
>>>>>>>> they are all extensible with custom triggers. Making it part
of
>>>>>>>>>
>>>>>>>> Falcon
>>>>
>>>>> will
>>>>>>>
>>>>>>>> suffer the same issues that Oozie has today.
>>>>>>>>>
>>>>>>>>> I'm sorry but I'm a HUGE -1 to this being built into
Falcon
>>>>>>>>>
>>>>>>>> codebase.
>>>>
>>>>>
>>>>>>>>> However, I'm +1 to reusing Quartz scheduler that already
exists -
>>>>>>>>>
>>>>>>>> stand it
>>>>>>>
>>>>>>>> up outside or embed it like we do for active MQ.
>>>>>>>>>
>>>>>>>>> Phase 2 - I'd like to see we write a simple DAG execution
layer in
>>>>>>>>>
>>>>>>>> YARN as
>>>>>>>
>>>>>>>> an app master with out DB and keeps state on HDFS as an alternate
>>>>>>>>>
>>>>>>>> to
>>>>
>>>>> Oozie.
>>>>>>>
>>>>>>>>
>>>>>>>>> Then we will have a nimble falcon which can kick ass.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Sun, Dec 21, 2014 at 6:13 AM, Srikanth Sundarrajan
<
>>>>>>>>>
>>>>>>>> sriksun@hotmail.com>
>>>>>>>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>  Hello Team,
>>>>>>>>>>
>>>>>>>>>> Since its inception Falcon has used Oozie for process
>>>>>>>>>>
>>>>>>>>> orchestration as
>>>>
>>>>> well as feed life cycle phase executions, while this has worked
>>>>>>>>>>
>>>>>>>>> reasonably
>>>>>>>
>>>>>>>> and allowed to make higher level capabilities available through
>>>>>>>>>>
>>>>>>>>> Falcon, we
>>>>>>>
>>>>>>>> are increasing seeing scenarios where this is proving to
be a
>>>>>>>>>>
>>>>>>>>> limiting
>>>>
>>>>> factor. In its current form, Falcon relies on Oozie for both
>>>>>>>>>>
>>>>>>>>> scheduling and
>>>>>>>
>>>>>>>> for workflow execution, due to which the scheduling is limited
>>>>>>>>>>
>>>>>>>>> to time
>>>>
>>>>> based/cron based scheduling with additional gating conditions on
>>>>>>>>>>
>>>>>>>>> data
>>>>
>>>>> availability. Also this imposes restrictions on datesets being
>>>>>>>>>> periodic/cyclic in nature.
>>>>>>>>>>
>>>>>>>>>>  From an orchestration stand point, it would help
if we can
>>>>>>>>>>
>>>>>>>>> support
>>>>
>>>>> standard gating / scheduling primitives via Falcon:
>>>>>>>>>>
>>>>>>>>>> 1. Simple periodic scheduling with no gating conditions
>>>>>>>>>> 2. Cron based scheduling (day of week, day of the
month, specific
>>>>>>>>>>
>>>>>>>>> hours
>>>>>>>
>>>>>>>> and non-periodic) with no gating conditions
>>>>>>>>>> 3. Availability of new data (assuming monotonically
increasing
>>>>>>>>>>
>>>>>>>>> data
>>>>
>>>>> version, availavility of new versions)
>>>>>>>>>> 4. Changes to existing data (reinstatement - similar
to late data
>>>>>>>>>>
>>>>>>>>> handling)
>>>>>>>
>>>>>>>> 5. External trigger/notifications
>>>>>>>>>> 6. Availability of specific instances of data as
declared as
>>>>>>>>>>
>>>>>>>>> mandatory
>>>>
>>>>> dependency
>>>>>>>>>> 7. Availability of a minimum subset of instances
of data
>>>>>>>>>>
>>>>>>>>> declared as
>>>>
>>>>> mandatory depedency (at least 10 hourly instances of a day with
>>>>>>>>>>
>>>>>>>>> 24
>>>>
>>>>> instances for ex)
>>>>>>>>>> 8. Valid combinations of the above.
>>>>>>>>>>
>>>>>>>>>> In this context, I would like to propose that we
move away from
>>>>>>>>>>
>>>>>>>>> Oozie
>>>>
>>>>> for
>>>>>>>
>>>>>>>> the orchestration requirements and have them implemented
natively
>>>>>>>>>>
>>>>>>>>> within
>>>>>>>
>>>>>>>> Falcon. It will no doubt make Falcon server bulkier and heavier
>>>>>>>>>>
>>>>>>>>> in
>>>>
>>>>> both
>>>>>>>
>>>>>>>> code and deployment, but seems like without it, the orchestration
>>>>>>>>>>
>>>>>>>>> within
>>>>>>>
>>>>>>>> Falcon will be limited by capabilities available within Oozie.
>>>>>>>>>>
>>>>>>>>>> Please do note that this suggestion is restricted
to the
>>>>>>>>>>
>>>>>>>>> scheduling
>>>>
>>>>> and
>>>>>>>
>>>>>>>> not to the workflow execution.
>>>>>>>>>>
>>>>>>>>>> Would like to hear from fellow developers and users
on what your
>>>>>>>>>>
>>>>>>>>> thoughts
>>>>>>>
>>>>>>>> are. Please do chime in with your views.
>>>>>>>>>>
>>>>>>>>>> Regards
>>>>>>>>>> Srikanth Sundarrajan
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Regards,
>>>>>>>>> Venkatesh
>>>>>>>>>
>>>>>>>>> “Perfection (in design) is achieved not when there
is nothing
>>>>>>>>>
>>>>>>>> more to
>>>>
>>>>> add,
>>>>>>>
>>>>>>>> but rather when there is nothing more to take away.”
>>>>>>>>> - Antoine de Saint-Exupéry
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Regards,
>>> Venkatesh
>>>
>>> “Perfection (in design) is achieved not when there is nothing more to
>>> add,
>>> but rather when there is nothing more to take away.”
>>> - Antoine de Saint-Exupéry
>>>
>>
>>
>>
> --
> Jean-Baptiste Onofré
> jbonofre@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message