nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aldrin Piri <>
Subject Re: Collaboration?
Date Tue, 17 Nov 2015 14:58:05 GMT
Corey has a lot of awesome points and best practices.

>From a slightly different vantage point, many of my core and early
experiences to NiFi were on teams of only two or three people where we
handled some large volume flows for various groups and organizations.
Typically, these flows took form fairly organically.  We started with a
receive of data and performed simple routing to send to various other
systems/organizations.  Over time, we would incrementally add new
functionality through extensions as new use cases and ways of interpreting
and structuring the data arose. Overall, the framework made it very easy
for a team with limited numbers to go from nothing to a large dataflow in
manageable steps as we transitioned from some of our existing
infrastructure and functionalities and further enabled us to get a holistic
view of the entirety of the dataflow under our management.

When you are first getting started, there is certainly a larger state of
flux, but as your flows take form, the interactions become less frequent
but you are still afforded the ability to easily morph and evolve as new
situations and needs arise.

On Tue, Nov 17, 2015 at 8:58 AM, Corey Flowers <>

> Good morning!
> We have used it for a while in multiple groups of 6-9 people each. Usually
> that is what it takes to manage a few large scale clusters. I think our
> team of 9 currently support 12 clusters and our team of 6 support a single
> giant cluster but they also support quite a few clouds after distribution.
> A few things we have learned over time and terms we have:
> 1) Processor Creep - This is where members of a team just template
> sections of the graph and use it again in another area of the graph or when
> they build new flows, don't actually incorporate it into existing flows.
> This is ok, but if you do this a lot, you end up just constantly adding
> processors and never removing.  Meaning your graph only grows in size, and
> loses efficiency. Sooner or later, someone has to go in and redraw the
> graph to fix all the inefficiencies.
> 2) Sharing the graph - this isn't really a problem for us. We usually just
> see who the last person to change the graph is and ask them if they are
> done before adding to the graph. Once the main area of the graph is built,
> adding new flows is very fast and control of the graph isn't as much of an
> issue.
> 3) Groups hitting your provenance locally - This can be really bad
> depending on how you are setup. You definitely want something like Ambari
> that you post to, and to not let every customer you support reaching into
> your graph to do provenance lookups. If anything, the team directly
> responsible should be the ones doing the searches for those teams, don't
> give the external teams access to do the searches. Our graph slowed to
> crawl when we had about 50 groups each doing local provenance lookups at
> the same time. It was a mess. I am not sure if this is still an issue as we
> haven't beaten it up to that level in the last few releases.
> 4) Flow.xml.gz version control - I actually don't recommend this at all.
> If you want to template sections of your graph and then version control
> those, then ok but in general it has not been a good practice. I am sure
> there is a good way to accomplish this but so far, we haven't found a good
> solution. So lets look at an example real quick:
>  Lets say you have a processor set and version control that, then you add
> 15 processors, run data, then decide you have to revert.
> 1) Your graphs for that section will have to be empty of data to revert or
> you will lose data
> 2) The time it has taken you to test those 15 processors, there was N
> number of changes to the graph (Remember you aren't just reverting your
> section of the graph. It is the whole thing). How do you keep the delta of
> the changes outside of your section?
> 3) Stopping ingest in one section (to make sure there is no data in your
> section) may cause others to backup.
> 4) To revert back, if you are clustered, you would need to reinsert the
> flow.xml.gz into the flow.tar or replace the flow.tar, remove all the
> flow.xml.gz's from the cluster nodes and restart your entire cluster.
> Now if you have built your flows into groups and template those groups,
> you could just stop a group, import the template, start the newly inserted
> template and remove the old processors, without the flow.xml.gz version
> control.
> If you are doing version control from the standpoint of having a kind of
> "undo", it doesn't really work. If you are doing it to have a backup in
> case you lose a server, then any central storage point should work just
> fine, but if you are clustered you are covered just by how it is
> implemented.
> I am sure there are a couple of things i missed but just throwing some
> lessons learned out there.
> Later!
> On Tue, Nov 17, 2015 at 4:20 AM, Juan Jose Escobar <
>> wrote:
>> Hello, Darren,
>> We are using Nifi collaboratively but the team of people using it is not
>> yet that large, so we can coordinate without problems even in production
>> clusters. In some cases, when you are performing some action on the UI you
>> may get "This NiFi instance has been updated by '<user>'. Please refresh to
>> synchronize the view" (a warning is also shown in the toolbar even if you
>> do nothing). Hitting refresh is okeish for us for now...
>> You can organize your flow in different dataflows that run concurrently.
>> Flows can be independent or data can flow between them just by connecting
>> processors. Note that "flows" here are just a logical division, but there
>> is nothing in Nifi separating them as far as I know. You can still organize
>> flows using Process Groups so that everything more structured, at least
>> visually.
>> For version control we keep the flow in git. You can request a backup
>> using the API and move that into Git. You can have a cron job script for
>> that. You may decompress the flow to have an XML format, just be strict on
>> changes.
>> Also, if you want to isolate flows in version control, you can save the
>> specific parts as templates and put that in Git (we do that only for sub
>> flows that are not part of the standard setup, kind of optional).
>> Separating the templates that form the actual flow may help your team to
>> work concurrently, but we are not doing that since we haven't hit
>> concurrent access problems yet.
>> Curious to hear about better ways to do these things...
>> Regards
>> On Mon, Nov 16, 2015 at 6:46 PM, darren <> wrote:
>>> Hi
>>> How are people using nifi collaboratively with various people designing
>>> flows that may or may not be used together?
>>> Also how are people doing version control of flows e.g. in git?
>>> Thanks
>>> D
>>> Sent from my Verizon Wireless 4G LTE smartphone
> --
> Corey Flowers
> Vice President, Onyx Point, Inc
> (410) 541-6699
> -- This account not approved for unencrypted proprietary information --

View raw message