nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Carlos Manuel Fernandes (DSI)" <carlos.antonio.fernan...@cgd.pt>
Subject ELT on Nifi
Date Mon, 03 Oct 2016 18:25:42 GMT
Hi all,

When i saw Nifi for the first time , I try to build  a classical ETL/ELT flow , and this question
is recurrent for the new users.

Nifi has very good processors for the Extract and Load, the problem arise on Transform, because
in ETL/ELT  tools there are specific "processors"  (ex: map, SCD, etc.)  binded to DW concepts
 and sometimes binded  to a specific database (ex: SCDNetezza) . The Transformer processors
in Nifi  are general purpose  and not correlated with  this concepts. The immediate solution
is to create a lot of Custom script processors but  the metadata of ELT (sql) turn attributes
or code of processors, not an ideal solution.

But, If we put  the logic of Transform  outside of Nifi, for example in some Json structure
, then its relative easy, construct a ELT NIFI Template capable of run a generic ELT flows.

Example of a ELT JSon Structure  (the "steps" inside  the "flow" are to be executed on PutSql
in the same transaction)
{
       "Transformer": [{
             "name": "foo1",
             "type": "Map",
             "description": "Summarize the table foo from table bar",
             "flow": [{
                    "step": 1,
                    "description": "delete all data",
                    "stmt": "delete from  foo"
             }, {
                    "step": 2,
                    "Description": "Count f2 by f1",
                    "stmt": "insert into foo(c1, c2) select c1,sum(c2) from bar group by c1"
             }]
       }, {
             "name": "foo2",
             "type": "SCD- Slowly change Dimensions type 1",
             "description": "Update a prod table based on stage table",
             "flow": [{
                    "step": 1,
                    "description": "Process type 1",
                    "stmt": "Update Prod Set Prod.columns = Stage.Columns From Stage Inner
Join Prod on Stage.key = Prod.key Where Stage.IsType1 = 1 "
             }]
       }]
}

Example of a  NIFI template who execute that Json structure :

[cid:image002.png@01D21DAB.EE31A610]


This make sense?  Give me feedback.

Carlos




Mime
View raw message