nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From João Henrique Freitas <joa...@gmail.com>
Subject RE: Re: ELT on Nifi
Date Fri, 07 Oct 2016 14:20:26 GMT
Hi.

Maybe a linkedin/databus client processor could be created to handle ETL.

Em 06/10/2016 10:39, "Carlos Manuel Fernandes (DSI)" <
carlos.antonio.fernandes@cgd.pt> escreveu:

> Hi Uwe,
>
>
>
> I saw you had developed similar approach of mine. Joe Witt lunched a
> challenge  to build a processor based on Json structure I proposed.
>
>
>
> I think  we can use the code of convertJSONtoSQl processor as a template
> for this new processor.  This new processor will belong  to the category  -
> JSONtoSQL (the convertJSONtoSQL is the first one).
>
>
>
> We can  work together to reach this goal but first we must agree on the
> Json structure for the input.
>
>
>
> What you think?  You can contact me directly.
>
>
>
> Thanks
>
>
>
> Carlos
>
>
>
> *From:* Uwe Geercken [mailto:uwe.geercken@web.de]
> *Sent:* terça-feira, 4 de Outubro de 2016 14:42
> *To:* users@nifi.apache.org
> *Subject:* Aw: Re: ELT on Nifi
>
>
>
> Carlos,
>
>
>
> I think that is a good point.
>
>
>
> But I would like to bring up a little different view to it:
>
>
>
> I have developed a business ruleengine (open source) written in Java and
> it is meanwhile in production at least at two bigger companies - they both
> use the Pentaho ETL tool together with the ruleengine. You can use the
> rules to filter/evaluate conditions and there are also actions which
> execute or transform data. The advantage is, that within Pentaho it is just
> a plugin and the business logic (or if you will also IT logic) it managed
> externally (through a web interface and possibly by users or superusers
> themselve and not by IT). This keeps a proper seperation of
> responsibilities of business logic and IT logic and the ETL process itself
> is much, much cleaner.
>
>
>
> Likewise one could think of creating a plugin for Nifi which takes a
> similar approach: you have a processor that in the background calls the
> ruleengine. It runs and deliveres the results back to the process. Instead
> of having complex connections between transformation processors, which
> clutter the Nifi desktop there would be one processor for the ruleengine
> (of course also multiple ones).
>
>
>
> In one of my later projects I have implemented the complete invoicing
> process for the company I work for using the ruleengine. The ETL is very
> clean and contains only IT logic (formatting of fields, splitting of
> fields, renaming, etc) and the rest is in external rule projects which
> contain the business logic.
>
>
>
> My thinking is that the devision of responsibilities for the logic and a
> clean ETL or in the Nifi case a clean Flow diagram is a very strong
> argument for this approach.
>
>
>
> Of course there is nothing to say against a mixed approach - custom
> processors and ruleengine - I just wanted to explain my point a little bit.
> Everything is available on github.com/uwegeercken.
>
>
>
> I could write the Nifi code for the processor I guess, but I will need
> some help with testing, documentation and also packaging the nar file (I am
> not used to Maven and have struggled in the past to create a proper nar
> archive).
>
>
>
> Greetings,
>
>
>
> Uwe
>
>
>
> *Gesendet:* Dienstag, 04. Oktober 2016 um 04:48 Uhr
> *Von:* "Matt Burgess" <mattyb149@apache.org>
> *An:* users@nifi.apache.org
> *Betreff:* Re: ELT on Nifi
>
> Carlos,
>
>
>
> The extensible nature of NiFi, whether the overall architecture was
> intended for ETL/ELT and/or RDBMS/DW concepts or not, means that many of
> these kinds of operations are welcome (but possibly not yet present) in
> NiFi. Some might warrant framework changes, but for a good portion, many
> RDBMS/DW processors are possible but just haven't been added/contributed
> yet. In my experience, ETL/ELT tools have focused mainly on this kind of
> "processor" and in contrast can't handle the level of throughput, data
> formats, provenance/lineage, security, and/or data integrity that NiFi can.
> In exchange, NiFi doesn't have as many of the RDBMS/DW-specific processors
> available at this time. I see a few categories (please feel free to
> add/change/delete/discuss), mostly having to do with tabular (row-oriented,
> character-delimited) data:
>
>
>
> 1) Row-level operations. This includes projections (select fields from
> row), alter fields (change timestamp of column 'last_updated', e.g.), add
> column(s), replace-with-lookup, etc.
>
> 2) Table-level operations. This includes joins, grouping/aggregates,
> transposition, etc.
>
> 3) Composition/Application of the other two. This includes normalization &
> denormalization (star/snowflake schemas, e.g.), dimension updates
> (Kimball's SCD Type 2, e.g.), etc.
>
> 4) Bulk Loading. These usually involve custom code (although in many cases
> for NiFi you can deploy a command-line tool for bulk loading to a DB and
> use ExecuteProcess or ExecuteStreamCommand to make it happen). These are
> usually native processes for getting lots of data into the DB using an
> end-run around their own interfaces, possibly bypassing mechanisms that
> NiFi embraces, such as provenance. But they are often faster than their SQL
> interface counterparts for large data ingest.
>
> 5) Transactions. This involves executing a number of SQL statements as an
> atomic group (i.e. BEGIN, a bunch of INSERTs, COMMIT). Not all DBs support
> this (and many have their own dialects for such things).
>
>
>
> That's a lot of feature surface to cover! Luckily we have an ever-growing
> community filled with folks representing a whole spectrum of experience and
> a shared passion for data :)  I am very interested in your thoughts on
> where NiFi could improve on these (or other) fronts with respect to
> ETL/ELT, I think we can get some good discussions (and code contributions!)
> going on this. Alternatively, if you'd like to pursue a discussion on how
> to offload data transformations, I'm sure the community has thoughts on
> that as well.
>
>
>
> Regards,
>
> Matt
>
>
>
> P.S. I didn't include push-down optimization on the list because of its
> complexity and in NiFi terms involves things like dynamic flow-rewrites and
> other magic that IMHO is against the design principles of NiFi itself
> (simplicity, accountability, e.g.).
>
>
>
> On Mon, Oct 3, 2016 at 2:25 PM, Carlos Manuel Fernandes (DSI) <
> carlos.antonio.fernandes@cgd.pt> wrote:
>
> Hi all,
>
>
>
> When i saw Nifi for the first time , I try to build  a classical ETL/ELT
> flow , and this question is recurrent for the new users.
>
>
>
> Nifi has very good processors for the *Extract* and *Load*, the problem
> arise on Transform, because in ETL/ELT  tools there are specific
> “processors”  (ex: map, SCD, etc.)  binded to DW concepts  and sometimes
> binded  to a specific database (ex: SCDNetezza) . The Transformer
> processors in Nifi  are general purpose  and not correlated with  this
> concepts. The immediate solution is to create a lot of Custom script
> processors but  the metadata of ELT (sql) turn attributes or code of
> processors, not an ideal solution.
>
>
>
> But, If we put  the logic of *Transform*  outside of Nifi, for example in
> some Json structure , then its relative easy, construct a ELT NIFI Template
> capable of run a generic ELT flows.
>
>
>
> Example of a ELT JSon Structure  (the “steps” inside  the “flow” are to be
> executed on PutSql in the same transaction)
>
> {
>
>        "Transformer": [{
>
>              "name": "foo1",
>
>              "type": "Map",
>
>              "description": "Summarize the table foo from table bar",
>
>              "flow": [{
>
>                     "step": 1,
>
>                     "description": "delete all data",
>
>                     "stmt": "delete from  foo"
>
>              }, {
>
>                     "step": 2,
>
>                     "Description": "Count f2 by f1",
>
>                     "stmt": "insert into foo(c1, c2) select c1,sum(c2)
> from bar group by c1"
>
>              }]
>
>        }, {
>
>              "name": "foo2",
>
>              "type": "SCD- Slowly change Dimensions type 1",
>
>              "description": "Update a prod table based on stage table",
>
>              "flow": [{
>
>                     "step": 1,
>
>                     "description": "Process type 1",
>
>                     "stmt": "Update Prod Set Prod.columns = Stage.Columns
> From Stage Inner Join Prod on Stage.key = Prod.key Where Stage.IsType1 = 1 "
>
>              }]
>
>        }]
>
> }
>
>
>
> Example of a  NIFI template who execute that Json structure :
>
>
>
>
>
>
>
> This make sense?  Give me feedback.
>
>
>
> Carlos
>
>
>
>
>
>
>

Mime
View raw message