falcon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shwetha GS <shwetha...@inmobi.com>
Subject Re: falcon vs oozie
Date Tue, 16 Sep 2014 11:09:29 GMT
Alex,

Oozie lets you schedule data processing jobs. The emphasis is mainly on
processing and Oozie lets you define this processing through workflow and
coordinator (recurring workflow). You can specify the input datasets for
data processing (in coordinator) where you specify the data properties like
path, frequency, etc. If there are 2 coordinators that depend on the same
data, these details have to be defined twice. Now, if you want to add data
eviction(delete old data) , you have to define another coordinator. Oozie
provides APIs to manage these coordinators, but there is no easy way to
define and track the data lifecyle.


In contrast, falcon gives data view. Data is defined as Feed entity(with a
unique name) which contains the data path, frequency, the clusters where
this data exists, how long the data is retained in each cluster(eviction),
how the data is replicated across clusters and so on. The standard data
recipes like acquisition, eviction, replication are available directly. To
enable data processing across datasets, falcon exposes Process entity which
contains the input and output feed names(which references feed names
already defined), frequency of processing and how the data should be
processed. Data processing can be defined using either pig script, hive
script or oozie workflow.

In the backend, the different data lifecycles are implemented using a
scheduler which is Oozie currently, but can be replaced easily. Falcon APIs
hide the scheduler details and give easy way to define and manage the data
lifecycles.

Regards,
Shwetha





On Tue, Sep 9, 2014 at 9:01 PM, Alex Nastetsky <anastetsky@spryinc.com>
wrote:

> Hi,
>
> I have a general usage question about Falcon. I don't see a "user" mailing
> list, so I am sending it here. If there's a better place to direct the
> question, please let me know.
>
> I have been looking at the OnBoarding:
> http://falcon.incubator.apache.org/docs/OnBoarding.html
>
> I understand that Falcon uses Oozie underneath. What is the advantage of
> using Falcon instead of using Oozie directly?
>
> It looks like you can specify in your Input Feed information about your
> input data, but you can parameterize your paths in Oozie as well (using
> job.properties).
>
> I have also heard conflicting information about whether Falcon generates
> Oozie workflow.xml files, but in that on-boarding example, it looks like
> you need to create the workflow.xml manually. Which is correct?
>
> Thanks in advance,
> Alex.
>

-- 
_____________________________________________________________
The information contained in this communication is intended solely for the 
use of the individual or entity to whom it is addressed and others 
authorized to receive it. It may contain confidential or legally privileged 
information. If you are not the intended recipient you are hereby notified 
that any disclosure, copying, distribution or taking any action in reliance 
on the contents of this information is strictly prohibited and may be 
unlawful. If you have received this communication in error, please notify 
us immediately by responding to this email and then delete it from your 
system. The firm is neither liable for the proper and complete transmission 
of the information contained in this communication nor for any delay in its 
receipt.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message