nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Boris Tyukin <bo...@boristyukin.com>
Subject Re: Ingestion from databases: pure NiFi vs Kylo with Scoop
Date Sat, 04 Aug 2018 21:05:31 GMT
Vitaly,

The best way is to try yourself and build a simple process to prove your
case.

I got excited first about Kylo, but quickly realized I could do everything
I needed with NiFi. I did not really care about fancy UI with Kylo, but I
did love a lots of things - integration with Spark and sqoop, template s
for pipelines, centralized monitoring etc. But at the same time, it is
someone else's product, lagging behind nifi, with tons of other
dependencies and packages, built by that company.

I do believe you don't have to use sqoop if you don't want it - you can
build your own templates in Kylo which would be just a nifi flow with
parameters and use JDBC SQL processors instead.

Now, you will be missing a lot of cool features of sqoop. One example, is
direct database connectors (Oracle for example). Much better performance.
Changing timezones etc.

NiFi till recently could not ingest a table concurrently - with sqoop I can
run 32 mappers and it will break a table on 32 pieces and will ingest them
to hdfs.

NiFi have a similar ability now but I think till NiFi 1.6, you had to use
primary keys or something like that. I think this has been improved
recently and fetchdatabase processor can do a lot like breaking a table on
pieces and also support incremental loads.

Speaking of incrementals, I also wanted to build my own framework around
incremental loads with my own control table, audit and logging. I did not
use sqoop incremental load feature but some devs love it.

So if you do not care about all the cool sqoop features amd it's high
performance, and just need to ingest data, you will be fine using NiFi
processors.


Boris

On Fri, Aug 3, 2018, 15:28 Vitaly Krivoy <Vitaly_Krivoy@jhancock.com> wrote:

> We are considering using Kylo on top of NiFi. It is my understanding that
> while Kylo manages both NiFi and Spark, its designers decided to utilize
> Scoop from Spark in order to ingest the data from relational databases. I
> am also aware that it is possible to drive Scoop from NiFi using one of the
> processors which can run scripts. Why would Kylo designers rely on Scoop
> rather than on NiFi? It’s possible to set up a stand-alone NiFi instance
> and a NiFi cluster to do parallel database access. Scoop will achieve
> polarization for extraction from databases relying on the power of MR. We
> are a HortonWorks on Azure shop, so we already have infrastructure for both
> approaches. Does anyone have any feedback why would one approach be
> preferable to another?
>
>
>
> STATEMENT OF CONFIDENTIALITY The information contained in this email
> message and any attachments may be confidential and legally privileged and
> is intended for the use of the addressee(s) only. If you are not an
> intended recipient, please: (1) notify me immediately by replying to this
> message; (2) do not use, disseminate, distribute or reproduce any part of
> the message or any attachment; and (3) destroy all copies of this message
> and any attachments.
>

Mime
View raw message