falcon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Venkat Ranganathan <vranganat...@hortonworks.com>
Subject Re: [DISCUSS] Orchestration in Falcon
Date Wed, 08 Apr 2015 20:42:18 GMT
All good thoughts.  I like the staged approach of offloading some of the cumbersome Oozie features
and relying on the workflow engine.   As we all would agree, this is an area where we could
leverage the maturity and support for various action types which is provided by Oozie today
- and this will also help us integrate with other scheduling solutions more easily in future.


On 4/7/15, 10:10 AM, "Srikanth Sundarrajan" <sriksun@hotmail.com> wrote:

>Hi Arpit,
>    Thanks for bringing this up. The idea that was originally proposed in http://mail-archives.apache.org/mod_mbox/falcon-dev/201412.mbox/%3CBLU179-W931569420E9A4C4B4DDF57A4690@phx.gbl%3E
was to simply do away with the oozie coordinator, but continue to use oozie workflow engine
for the user workflows (this will also ensure continuity for users). This will ensure that
much of the security related complexities wouldn't have to be re-implemented in falcon and
will continue to be delegated to Oozie.
>Srikanth Sundarrajan
>> Subject: Re: [DISCUSS] Orchestration in Falcon
>> From: arpit@hortonworks.com
>> To: dev@falcon.apache.org
>> Date: Tue, 7 Apr 2015 17:02:16 +0000
>> One thing to consider oozie does handle most of the various security related configurations
seamlessly from falcon’s perspective. If we were to build this in falcon all of that logic
would have to move to falcon. Hopefully if we get rid of what you mention in #6 some of the
things might become simpler.
>> However what falcon will have to start supporting is allowing users to provide configuration
of clusters (hadoop and not falcon clusters) on the local filesystem just like how oozie does
and based on the read, write and execute uri it reads the appropriate configs and passes them
>> The above will be needed regardless of security or not. As falcon currently assumes
the hadoop configs is running with applies to all clusters it might deal with which might
not be true. And because of the above feature in oozie its able to handle it for most cases.
We still have this limitation in our current model but i think this feature would be needed
if we were to move the scheduler into falcon.
>> --
>> Arpit Gupta
>> Hortonworks Inc.
>> http://hortonworks.com/
>>> On Apr 7, 2015, at 9:49 AM, Srikanth Sundarrajan <sriksun@hotmail.com>
>>> I am fully behind this for the following reasons:
>>> 1. Managing the scheduling capability (regardless of the feasibility or infeasibility)
in Oozie means that all changes have to make it to Oozie upstream and released, before they
can be used from within Falcon.
>>> 2. Supporting new gating & throttling primitives with the awareness of dependencies
between entities seems to ask for too much changes into Oozie to be done incrementally. This
might require at least some major design changes in Oozie.
>>> 3. As many in the falcon-dev community would agree, it would be ideal for falcon
to be less dependent on Oozie in the long run.
>>> 4. It would be easier and simpler to handle stream datasets if falcon was to
directionally support these in near or far future.
>>> 5. Currently there is a lot of bloat in scheduler integration because of the
way Oozie functions and this complexity will reduce if we have a more simpler scheduler to
integrate with.
>>> 6. Notion of parent workflow (associated pre-processing & post-processing)
overheads by occupying a slot in the cluster also is begging for attention and improvement.
>>> Regards
>>> Srikanth Sundarrajan
>>> ----------------------------------------
>>>> Date: Tue, 7 Apr 2015 11:27:52 +0530
>>>> Subject: Re: [DISCUSS] Orchestration in Falcon
>>>> From: pallavi.rao@inmobi.com
>>>> To: dev@falcon.apache.org
>>>> Hi,
>>>> I was recently looking at some of the use cases at InMobi and how to
>>>> enhance Falcon to accommodate those and I realized that due to our
>>>> dependency on Oozie coordinator, some of these cannot be easily achieved
>>>> take a much longer cycle as we have to wait for Oozie to add some
>>>> functionality.
>>>> I was pointed to this thread that dates slightly before my time in Falcon
>>>> https://www.mail-archive.com/dev@falcon.incubator.apache.org/msg09268.html).
>>>> I wanted to reopen the thread for discussion, with my 2 cents:
>>>> 1. Some of the scheduling primitives that are already mentioned in the
>>>> thread, especially, support for a-periodic datasets or external triggering
>>>> mechanisms are not available in Oozie. It might not even be a natural fit
>>>> for Oozie to add these.
>>>> 2. Adding new primitives in Falcon becomes harder and longer as we
>>>> completely depend on Oozie for the same. Extensibility of Falcon is stunted.
>>>> 3. Oozie has very limited support for throttling resource utilization.
>>>> We can only control the no. of parallel instances of a coordinator job.
>>>> 4. Oozie currently has no notion of inter dependency of
>>>> instances/workflows, whereas, in Falcon, it will be very useful to
>>>> gate/throttle based on the interdependency. For example, re-run a pipeline
>>>> (or a subset) or throttle resource utilization of a pipeline when in
>>>> "backlog catchup" mode.
>>>> 5. We end up with bugs like FALCON-1127
>>>> <https://issues.apache.org/jira/browse/FALCON-1127>, because Falcon
>>>> constantly needs to play catchup with Oozie changes.
>>>> On the thread, most people did seem to be in favor of a native scheduler
>>>> Falcon. If you all think this is useful, I'll volunteer to start work on
>>>> this and we can build out a scheduler/orchestrator in Falcon that can open
>>>> up a whole lot of possibilities for Falcon users.
>>>> Thanks,
>>>> Pallavi
>>>> --
>>>> _____________________________________________________________
>>>> The information contained in this communication is intended solely for the
>>>> use of the individual or entity to whom it is addressed and others
>>>> authorized to receive it. It may contain confidential or legally privileged
>>>> information. If you are not the intended recipient you are hereby notified
>>>> that any disclosure, copying, distribution or taking any action in reliance
>>>> on the contents of this information is strictly prohibited and may be
>>>> unlawful. If you have received this communication in error, please notify
>>>> us immediately by responding to this email and then delete it from your
>>>> system. The firm is neither liable for the proper and complete transmission
>>>> of the information contained in this communication nor for any delay in its
>>>> receipt.
View raw message