nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Payne <>
Subject Re: Multiple dataflows with sub-flows and version control
Date Fri, 02 Jan 2015 17:33:53 GMT

Within NiFi you can create many different dataflows within the same graph and run them concurrently.
We've built flows with several hundred Processors. They data can flow between flows by simply
connecting the Processors together. 

If you want to separate the flows logically because it makes more sense to you to visualize
them that way, you may want to use Process Groups. 

I'm on my cell phone right now so I cannot draw up an example for you but I will this afternoon
when I have a chance. But the basic idea is that for #1 you would have:

GetFile -> PutHDFS

And along side that another GetFile -> CompressContent -> the same PutHDFS. 

In this case you can even do this with the following flow:

GetFile -> IdentifyMimeType (to check if compressed) -> CompressContent (set to decompress
and the compression type come from mime type, which is identified by the previous processor)
-> PutHDFS

With regards to #2:
You can build the new flow right along side the old flow. When you are ready to switch, simply
change the connection to send data to the new flow instead of the old one. 

Again, I'll put together some examples this afternoon with screen shots that should help.
Let me know if this helps or if it creates more questions (or both :))


Sent from my iPhone

> On Jan 2, 2015, at 11:37 AM, Edenfield, Orrin <> wrote:
> Hello everyone - I'm new to the mailing list and I've tried to search the JIRA and mailing
list to see if this has already been addressed and didn't find anything so here it goes:
> When I think about the capabilities of this tool I instantly think of ETL-type tools.
So the questions/comments below are likely to be coming from that frame of mind - let me know
if I've misunderstood a key concept of NiFi as I think that could be possible.
> Is it possible to have NiFi service setup and running and allow for multiple dataflows
to be designed and deployed (running) at the same time?  So far in my testing I've found that
I can get NiFi service up and functioning as expected on my cluster edge node but I'd like
to be able to design multiple dataflows for the following reasons.
> 1. I have many datasets that will need some of the same flow actions but not all of them.
I'd like to componentize the flows and possibly have multiple flows cascade from one to another.
For example:  I will want all data to flow into an HDFS endpoint but dataset1 will be coming
in as delimited data so it can go directly into the GetFile processor while I need dataset2
to go through a CompressContent processor first.
> 2. Because I have a need in #1 above - I'd like to be able to design multiple flows (specific
to a data need or component flows that work together) and have them all be able to be deployed
(running) concurrently.
> Also - it would be nice to be able to version control these designed flows so I can have
1 flow running while modifying a version 2.0 of that flow and then once the updates have been
made then I can safely and effectively have a mechanism to shut down flow.v1 and start up
> Thank you.
> --
> Orrin Edenfield

View raw message