nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edenfield, Orrin" <>
Subject RE: Multiple dataflows with sub-flows and version control
Date Mon, 05 Jan 2015 15:44:15 GMT
Thank you everyone for the help with explaining the logical approach to multiple flows that
I'll need to take - since it is different than my "multiple ETL job" history I'm accustomed
to.  I think I'm starting to understand - and this can work similarly with ETL tools I'm familiar
with (Informatica, Sterling Integrator, Talend, Pentaho, etc.)

I'm still trying to start very simply with a compression of the input data before landing
into HDFS - so I've setup multiple flows between the GetFile and the PutHDFS and this seems
to be working as I expect it (see attached screenshot).  

I will need to think some more about how this can be used when it comes to our existing ETL
pipelines but I think IdentifyMimeType, EvaluateRegularExpression, HashContent, MonitorActivity,
ReplaceTextWithMapping, RouteOnContent, and even SegmentContent may get us a long way.

Thank you.

Orrin Edenfield

-----Original Message-----
From: [] 
Sent: Monday, January 05, 2015 10:07 AM
Subject: RE: Multiple dataflows with sub-flows and version control

Nifi is very capable of multiple data flows. In fact I have used it in this way for some time.
Whether it is getting data from a source,, doing some processing on it, and sending it to
multiple destinations, OR, sending data form multiple sources to a single destination, OR
having multiple flows defined at one time. It is quite flexible. As for version control, I
have not played with this, but the version I last used had a limited capability. 

If you want to create a new flow, but keep the old running, just instantiate another flow
in parallel, NiFi doesn't really care that they are the same.

To answer 1. Yes, there is capabilities (at least in the version I used) to test the data
and alter the flow based on the type. 2 is answered above. Also, creating additional processors
is not that difficult if you have the documentation. we actually created ours by reverse engineering
NiFi processors, this would be another way to solve your issues.

Ralph Spangler
From: Edenfield, Orrin []
Sent: Friday, January 02, 2015 11:37 AM
Subject: Multiple dataflows with sub-flows and version control

Hello everyone - I'm new to the mailing list and I've tried to search the JIRA and mailing
list to see if this has already been addressed and didn't find anything so here it goes:

When I think about the capabilities of this tool I instantly think of ETL-type tools. So the
questions/comments below are likely to be coming from that frame of mind - let me know if
I've misunderstood a key concept of NiFi as I think that could be possible.

Is it possible to have NiFi service setup and running and allow for multiple dataflows to
be designed and deployed (running) at the same time?  So far in my testing I've found that
I can get NiFi service up and functioning as expected on my cluster edge node but I'd like
to be able to design multiple dataflows for the following reasons.

1. I have many datasets that will need some of the same flow actions but not all of them.
I'd like to componentize the flows and possibly have multiple flows cascade from one to another.
For example:  I will want all data to flow into an HDFS endpoint but dataset1 will be coming
in as delimited data so it can go directly into the GetFile processor while I need dataset2
to go through a CompressContent processor first.

2. Because I have a need in #1 above - I'd like to be able to design multiple flows (specific
to a data need or component flows that work together) and have them all be able to be deployed
(running) concurrently.

Also - it would be nice to be able to version control these designed flows so I can have 1
flow running while modifying a version 2.0 of that flow and then once the updates have been
made then I can safely and effectively have a mechanism to shut down flow.v1 and start up

Thank you.

Orrin Edenfield

  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message