falcon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Srikanth Sundarrajan <srik...@hotmail.com>
Subject RE: [DISCUSS] Orchestration in Falcon
Date Tue, 07 Apr 2015 17:10:14 GMT
Hi Arpit,
    Thanks for bringing this up. The idea that was originally proposed in http://mail-archives.apache.org/mod_mbox/falcon-dev/201412.mbox/%3CBLU179-W931569420E9A4C4B4DDF57A4690@phx.gbl%3E
was to simply do away with the oozie coordinator, but continue to use oozie workflow engine
for the user workflows (this will also ensure continuity for users). This will ensure that
much of the security related complexities wouldn't have to be re-implemented in falcon and
will continue to be delegated to Oozie.

Srikanth Sundarrajan

> Subject: Re: [DISCUSS] Orchestration in Falcon
> From: arpit@hortonworks.com
> To: dev@falcon.apache.org
> Date: Tue, 7 Apr 2015 17:02:16 +0000
> One thing to consider oozie does handle most of the various security related configurations
seamlessly from falcon’s perspective. If we were to build this in falcon all of that logic
would have to move to falcon. Hopefully if we get rid of what you mention in #6 some of the
things might become simpler.
> However what falcon will have to start supporting is allowing users to provide configuration
of clusters (hadoop and not falcon clusters) on the local filesystem just like how oozie does
and based on the read, write and execute uri it reads the appropriate configs and passes them
> The above will be needed regardless of security or not. As falcon currently assumes the
hadoop configs is running with applies to all clusters it might deal with which might not
be true. And because of the above feature in oozie its able to handle it for most cases. We
still have this limitation in our current model but i think this feature would be needed if
we were to move the scheduler into falcon.
> --
> Arpit Gupta
> Hortonworks Inc.
> http://hortonworks.com/
>> On Apr 7, 2015, at 9:49 AM, Srikanth Sundarrajan <sriksun@hotmail.com> wrote:
>> I am fully behind this for the following reasons:
>> 1. Managing the scheduling capability (regardless of the feasibility or infeasibility)
in Oozie means that all changes have to make it to Oozie upstream and released, before they
can be used from within Falcon.
>> 2. Supporting new gating & throttling primitives with the awareness of dependencies
between entities seems to ask for too much changes into Oozie to be done incrementally. This
might require at least some major design changes in Oozie.
>> 3. As many in the falcon-dev community would agree, it would be ideal for falcon
to be less dependent on Oozie in the long run.
>> 4. It would be easier and simpler to handle stream datasets if falcon was to directionally
support these in near or far future.
>> 5. Currently there is a lot of bloat in scheduler integration because of the way
Oozie functions and this complexity will reduce if we have a more simpler scheduler to integrate
>> 6. Notion of parent workflow (associated pre-processing & post-processing) overheads
by occupying a slot in the cluster also is begging for attention and improvement.
>> Regards
>> Srikanth Sundarrajan
>> ----------------------------------------
>>> Date: Tue, 7 Apr 2015 11:27:52 +0530
>>> Subject: Re: [DISCUSS] Orchestration in Falcon
>>> From: pallavi.rao@inmobi.com
>>> To: dev@falcon.apache.org
>>> Hi,
>>> I was recently looking at some of the use cases at InMobi and how to
>>> enhance Falcon to accommodate those and I realized that due to our
>>> dependency on Oozie coordinator, some of these cannot be easily achieved or
>>> take a much longer cycle as we have to wait for Oozie to add some
>>> functionality.
>>> I was pointed to this thread that dates slightly before my time in Falcon (
>>> https://www.mail-archive.com/dev@falcon.incubator.apache.org/msg09268.html).
>>> I wanted to reopen the thread for discussion, with my 2 cents:
>>> 1. Some of the scheduling primitives that are already mentioned in the
>>> thread, especially, support for a-periodic datasets or external triggering
>>> mechanisms are not available in Oozie. It might not even be a natural fit
>>> for Oozie to add these.
>>> 2. Adding new primitives in Falcon becomes harder and longer as we
>>> completely depend on Oozie for the same. Extensibility of Falcon is stunted.
>>> 3. Oozie has very limited support for throttling resource utilization.
>>> We can only control the no. of parallel instances of a coordinator job.
>>> 4. Oozie currently has no notion of inter dependency of
>>> instances/workflows, whereas, in Falcon, it will be very useful to
>>> gate/throttle based on the interdependency. For example, re-run a pipeline
>>> (or a subset) or throttle resource utilization of a pipeline when in
>>> "backlog catchup" mode.
>>> 5. We end up with bugs like FALCON-1127
>>> <https://issues.apache.org/jira/browse/FALCON-1127>, because Falcon
>>> constantly needs to play catchup with Oozie changes.
>>> On the thread, most people did seem to be in favor of a native scheduler in
>>> Falcon. If you all think this is useful, I'll volunteer to start work on
>>> this and we can build out a scheduler/orchestrator in Falcon that can open
>>> up a whole lot of possibilities for Falcon users.
>>> Thanks,
>>> Pallavi
>>> --
>>> _____________________________________________________________
>>> The information contained in this communication is intended solely for the
>>> use of the individual or entity to whom it is addressed and others
>>> authorized to receive it. It may contain confidential or legally privileged
>>> information. If you are not the intended recipient you are hereby notified
>>> that any disclosure, copying, distribution or taking any action in reliance
>>> on the contents of this information is strictly prohibited and may be
>>> unlawful. If you have received this communication in error, please notify
>>> us immediately by responding to this email and then delete it from your
>>> system. The firm is neither liable for the proper and complete transmission
>>> of the information contained in this communication nor for any delay in its
>>> receipt.
View raw message