falcon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Baptiste Onofré ...@nanthrax.net>
Subject Re: [DISCUSS] Orchestration in Falcon
Date Thu, 01 Jan 2015 14:21:22 GMT
+1

Regards
JB

On 12/31/2014 03:53 PM, Srikanth Sundarrajan wrote:
> Can we pick up this thread in the new year when folks are back from break? I am in total
agreement with Venkatesh here. We ought to have a long term sustainable approach. Also I feel
that the capabilities that we would like to enable on falcon and getting them done through
oozie in near term seems to be a tall ask anyways.
>
> Regards
> Srikanth Sundarrajan
>
>> Date: Tue, 23 Dec 2014 16:44:06 -0800
>> Subject: Re: [DISCUSS] Orchestration in Falcon
>> From: venkatesh@innerzeal.com
>> To: dev@falcon.incubator.apache.org
>>
>> Chugging along with Oozie is bad for Falcon in the long run, for users and
>> developers. Its horribly complex to work through the many rough edges
>> architecturally in Oozie. Look at all the patches for security that I had
>> to fix around Oozie. Its unnecessarily very complex, non-uniform and is NOT
>> meant to be used by another tool like Falcon but was built around end user.
>>
>> This is a good discussion to have - may be explore oozie for short-term but
>> look at alternative solutions for the long-term.
>>
>> On Tue, Dec 23, 2014 at 7:28 AM, Srikanth Sundarrajan <sriksun@hotmail.com>
>> wrote:
>>
>>> @jb, There is no doubt merit in mapping them to oozie if possible and if
>>> extensions are simple and straight forward enough.
>>>
>>> Also had a quick chat offline with Shwetha and she mentioned about some
>>> work happening in Oozie in this regard. On further digging up, found
>>> https://issues.apache.org/jira/browse/OOZIE-1976. This is possibly what
>>> Shwetha was referring to. From the looks of it, this tries to address item
>>> #7 in the original thread.  May be there are more jiras where additional
>>> work such as a-periodic datasets is being worked on. Perhaps @Shwetha can
>>> throw some light on what is being considered and/or how these
>>> gating/orchestration use cases can be managed.
>>>
>>> Regards
>>> Srikanth Sundarrajan
>>>
>>>> Date: Tue, 23 Dec 2014 11:06:24 +0100
>>>> From: jb@nanthrax.net
>>>> To: dev@falcon.incubator.apache.org
>>>> Subject: Re: [DISCUSS] Orchestration in Falcon
>>>>
>>>> Hi all,
>>>>
>>>> I second Shwetha there. I think we can achieve such features in Oozie
>>>> (with some adaptations).
>>>>
>>>> Regards
>>>> JB
>>>>
>>>> Le 2014-12-23 10:53, Shwetha G S a écrit :
>>>>> If we can get rid of oozie entirely, yes we can explore other
>>>>> possibilities. But if we are still going to use oozie for DAG
>>>>> execution, we
>>>>> are going to add add another bottleneck in the whole
>>>>> execution(currently,
>>>>> falcon is not in the workflow execution path) and I don't think its
>>>>> worth
>>>>> it.
>>>>>
>>>>> The features that are outlined above are all available in basic forms
>>>>> in
>>>>> oozie and it should be easy to enhance them/make them as extension
>>>>> points.
>>>>>
>>>>>
>>>>>
>>>>> -Shwetha
>>>>>
>>>>> On Tue, Dec 23, 2014 at 8:12 AM, Srikanth Sundarrajan
>>>>> <sriksun@hotmail.com>
>>>>> wrote:
>>>>>
>>>>>> Here are few more gaps that we ought to solve for while we are on
the
>>>>>> subject:
>>>>>>
>>>>>> 1. Ability to attach to start & finish events of workflow execution.
>>>>>> Currently we have post processing hook to listen to finish events,
but
>>>>>> we
>>>>>> do run into scenarios where there are occasional failures with
>>>>>> post-processing and there is potential phase lag in learning about
the
>>>>>> events.
>>>>>> 2. Strict enforcement of concurrency control possibly spanning process
>>>>>> boundaries.
>>>>>> 3. Ability to tune how backlogs have to be caught up (old instances
to
>>>>>> be
>>>>>> given higher priority, newer instances to be given higher priority,
or
>>>>>> some
>>>>>> sort of weights to allow both to make progress at varying rates).
>>>>>> There
>>>>>> have been asks for routing current vs older instances to different
>>>>>> queues
>>>>>> by users as an alternative.
>>>>>> 4. Ability to have a notion of non-time based feed instances and
>>>>>> related
>>>>>> coordination.
>>>>>> 5. Currently keeping track of and managing SLAs is also a challenge,
>>>>>> but
>>>>>> with #1 addressed, this might be a lesser concern.
>>>>>>
>>>>>> Regards
>>>>>> Srikanth Sundarrajan
>>>>>>
>>>>>>> Subject: Re: [DISCUSS] Orchestration in Falcon
>>>>>>> From: sriksun@hotmail.com
>>>>>>> Date: Tue, 23 Dec 2014 06:30:30 +0530
>>>>>>> To: dev@falcon.incubator.apache.org
>>>>>>>
>>>>>>> @venkatesh, the question really is how do we enable these gating
pre
>>>>>> conditions. Seems hard enough to add them to oozie, but am not
>>>>>> intimately
>>>>>> familiar with oozie to comment on how hard or easy it is. Like I
>>>>>> responded
>>>>>> to @ajay on the same thread, if we are to do away with coordination
>>>>>> through
>>>>>> oozie, we can follow up this discussion with approaches and design.
>>>>>> Though
>>>>>> I had quartz in my mind, wanted to leave that out of discussion to
see
>>>>>> if
>>>>>> there is consensus for moving away from oozie coords and implementing
>>>>>> them
>>>>>> through other means.
>>>>>>>
>>>>>>> Sent from my iPhone
>>>>>>>
>>>>>>>> On 23-Dec-2014, at 1:16 am, "Seetharam Venkatesh" <
>>>>>> venkatesh@innerzeal.com> wrote:
>>>>>>>>
>>>>>>>> What is the purpose of this decoupling? Why build this into
>>> Falcon?
>>>>>>>> Scheduling is so common that there are dime a dozen schedulers
>>> today
>>>>>> and
>>>>>>>> they are all extensible with custom triggers. Making it part
of
>>> Falcon
>>>>>> will
>>>>>>>> suffer the same issues that Oozie has today.
>>>>>>>>
>>>>>>>> I'm sorry but I'm a HUGE -1 to this being built into Falcon
>>> codebase.
>>>>>>>>
>>>>>>>> However, I'm +1 to reusing Quartz scheduler that already
exists -
>>>>>> stand it
>>>>>>>> up outside or embed it like we do for active MQ.
>>>>>>>>
>>>>>>>> Phase 2 - I'd like to see we write a simple DAG execution
layer in
>>>>>> YARN as
>>>>>>>> an app master with out DB and keeps state on HDFS as an alternate
>>> to
>>>>>> Oozie.
>>>>>>>>
>>>>>>>> Then we will have a nimble falcon which can kick ass.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sun, Dec 21, 2014 at 6:13 AM, Srikanth Sundarrajan <
>>>>>> sriksun@hotmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hello Team,
>>>>>>>>>
>>>>>>>>> Since its inception Falcon has used Oozie for process
>>> orchestration as
>>>>>>>>> well as feed life cycle phase executions, while this
has worked
>>>>>> reasonably
>>>>>>>>> and allowed to make higher level capabilities available
through
>>>>>> Falcon, we
>>>>>>>>> are increasing seeing scenarios where this is proving
to be a
>>> limiting
>>>>>>>>> factor. In its current form, Falcon relies on Oozie for
both
>>>>>> scheduling and
>>>>>>>>> for workflow execution, due to which the scheduling is
limited
>>> to time
>>>>>>>>> based/cron based scheduling with additional gating conditions
on
>>> data
>>>>>>>>> availability. Also this imposes restrictions on datesets
being
>>>>>>>>> periodic/cyclic in nature.
>>>>>>>>>
>>>>>>>>>  From an orchestration stand point, it would help if
we can
>>> support
>>>>>>>>> standard gating / scheduling primitives via Falcon:
>>>>>>>>>
>>>>>>>>> 1. Simple periodic scheduling with no gating conditions
>>>>>>>>> 2. Cron based scheduling (day of week, day of the month,
specific
>>>>>> hours
>>>>>>>>> and non-periodic) with no gating conditions
>>>>>>>>> 3. Availability of new data (assuming monotonically increasing
>>> data
>>>>>>>>> version, availavility of new versions)
>>>>>>>>> 4. Changes to existing data (reinstatement - similar
to late data
>>>>>> handling)
>>>>>>>>> 5. External trigger/notifications
>>>>>>>>> 6. Availability of specific instances of data as declared
as
>>> mandatory
>>>>>>>>> dependency
>>>>>>>>> 7. Availability of a minimum subset of instances of data
>>> declared as
>>>>>>>>> mandatory depedency (at least 10 hourly instances of
a day with
>>> 24
>>>>>>>>> instances for ex)
>>>>>>>>> 8. Valid combinations of the above.
>>>>>>>>>
>>>>>>>>> In this context, I would like to propose that we move
away from
>>> Oozie
>>>>>> for
>>>>>>>>> the orchestration requirements and have them implemented
natively
>>>>>> within
>>>>>>>>> Falcon. It will no doubt make Falcon server bulkier and
heavier
>>> in
>>>>>> both
>>>>>>>>> code and deployment, but seems like without it, the orchestration
>>>>>> within
>>>>>>>>> Falcon will be limited by capabilities available within
Oozie.
>>>>>>>>>
>>>>>>>>> Please do note that this suggestion is restricted to
the
>>> scheduling
>>>>>> and
>>>>>>>>> not to the workflow execution.
>>>>>>>>>
>>>>>>>>> Would like to hear from fellow developers and users on
what your
>>>>>> thoughts
>>>>>>>>> are. Please do chime in with your views.
>>>>>>>>>
>>>>>>>>> Regards
>>>>>>>>> Srikanth Sundarrajan
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Regards,
>>>>>>>> Venkatesh
>>>>>>>>
>>>>>>>> “Perfection (in design) is achieved not when there is nothing
>>> more to
>>>>>> add,
>>>>>>>> but rather when there is nothing more to take away.”
>>>>>>>> - Antoine de Saint-Exupéry
>>>>>>
>>>>>>
>>>
>>>
>>
>>
>>
>> --
>> Regards,
>> Venkatesh
>>
>> “Perfection (in design) is achieved not when there is nothing more to add,
>> but rather when there is nothing more to take away.”
>> - Antoine de Saint-Exupéry
>   		 	   		
>

-- 
Jean-Baptiste Onofré
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Mime
View raw message