nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bryan Bende <bbe...@gmail.com>
Subject Re: NiFi Registry feature branch workflow (possibly via Git-Backed MetadataService, FlowPersistenceProvider)
Date Tue, 05 Mar 2019 19:25:00 GMT
Just to confirm that I understand the reason for the git metadata
service, its so that you can branch the metadata and flow content
together at the same time?

For example, you have one bucket with one flow with two versions, and
all the metadata is in the metadata service and content of the flow in
the persistence provider, and if they are both in git then you can
fork them both and have another registry instance point to those forks
and continue to evolve independently?

I think some of my hesitations are the following...

- Maybe not a huge issue, but it will definitely increase the amount
of work to maintain two implementations. For example, I've currently
added about 25 methods related to extension registry stuff (some in
master, some on a soon to be submitted branch), and that was a lot of
work just for the DB implementation, and then I'd have to implement
those same 25 methods for the git implementation. Then there's also
additional testing to ensure that every release works with both
configurations.

- I'd like to know more about how we'd implement query methods against
git, like some of the methods that are in there now such as "get flows
by bucket id" and others, or some new stuff for extension registry
like "get extension by tag". Would the entire metadata need to be
loaded into memory to efficiently query it?

- The git metadata feels more specific to flows since flow content can
be stored in git, but for extensions the initial implementation has a
file-system store and then there would likely be implementations that
use s3 or some object store. So in that case does it really make sense
to use git for all of the metadata when only some of the content is
stored in git?  I think I'd feel better about it if there was some way
to separate the metadata for different types of items, but I don't see
a way to do that unless we make a more fundamental change to how
everything is structured. Right now buckets are the highest level and
then with in a bucket can be any item, but I think to completely
separate everything you'd have to first choose the type of item, then
there are buckets with in that type of item, and then the items
themselves. Every operation would be scoped to a type of item so you
would only ever operate against flows or extensions, but not both, and
then you would know which metadata service and persistence provider to
use. This would be a huge change though.

All that being said, if we do support multiple implementations of the
metadata service, I would suggest we don't claim it to be a public
extension point. This way we can still freely modify it as needed, but
can just switch the impl through something in
nifi-registry.properties.



On Tue, Mar 5, 2019 at 1:29 PM Bryan Rosander <bryanrosander@gmail.com> wrote:
>
> Thanks for the quick response!
>
> As far as the use case goes I was thinking of trying to support multiple
> branches at the same time, not necessarily just branch -> test -> merge
> (similar to how multiple developers can branch off in git and still
> reconcile their changes when they're ready).
>
> I've done some reading [1][2][3] on approaches to SDLC in NiFi but they
> mostly seem to be geared towards a linear promotion model (DEV -> STAGING
> -> PROD or similar).  Is there a recommended best practice for multiple
> concurrent lines of development?
>
> I can see how in the longer run it would be nice to encapsulate a full
> development lifecycle within the application's functionality but thought
> that having all of its state as (versionable) files on disk would allow for
> more flexibility around developer workflow without needing to account for
> it within the UI.
>
> Assuming that the implementation of a git-backed MetadataService (and
> pluggable MetadataService(s) in general) is achievable, do you have
> objections to it as a general concept? (I think that alone would go a long
> way towards addressing our use case and wouldn't mind taking a stab at it
> :) )
>
> [1] https://bryanbende.com/development/2018/06/20/apache-nifi-registry-0-2-0
> [2] https://www.slideshare.net/Hadoop_Summit/sdlc-with-apache-nifi
> [3] https://github.com/Chaffelson/nipyapi/blob/master/nipyapi/demo/fdlc.py
>
> Thanks,
> Bryan
>
> On Tue, Mar 5, 2019 at 12:08 PM Bryan Bende <bbende@gmail.com> wrote:
>
> > Bryan,
> >
> > The idea of branching to test changes is definitely interesting. A
> > couple of thoughts I had while thinking about it...
> >
> > 1) The FlowPersistenceProvider is an extension point, so how would
> > branching work if someone is not using the GitFlowPersistenceProvider?
> > I guess it could be disable if FlowPersistenceProvider is not an
> > instance of GitFlowPersistenceProvider (or some Branchable interface)?
> >
> > Personally I prefer to treat the git persistence provider as just
> > another binary storage for flows, with the added bonus that if you use
> > tools like GitHub you get some nice visuals and an automatic backup.
> > For most functionality I would prefer it still happen through our
> > applications. For example, many users want to diff flows using GitHub,
> > but it would be preferable to build a proper diff capability into NiFi
> > and NiFi Registry that worked regardless of the persistence provider
> > (there is already a REST end-point, we just need some UI work).
> >
> > 2) As far as the metadata service... One idea we thought of early on
> > was to have a metadata service for each type of versioned item, so for
> > example flows could have there metadata one place, and then extensions
> > somewhere else, but then the issue is that buckets can have multiple
> > types of items so there still has to be something that ties together
> > the knowledge of all items across all buckets, which is currently what
> > the metadata service is.
> >
> > I'm also not exactly sure how we'd implement all of the current access
> > patterns in the MetadataService [1] backed by git. There are a
> > significant number of methods to implement that have been built around
> > the idea of SQL access since there was no intention of this being
> > pluggable, plus it is continuing to grow with all of the extension
> > registry work, there are about 7 new DB tables so far to support
> > extension registry.
> >
> > 3) What would the UX be in NiFi UI and NiFi Registry UI?
> >
> > From the description of the idea, I'm not totally sure if the
> > branching concept would be completely behind the scenes or not, but if
> > it was going to be a first class concept then we'd need to put some
> > design and thought into how it fits into the UX.
> >
> > Thanks,
> >
> > Bryan
> >
> > [1]
> > https://github.com/apache/nifi-registry/blob/master/nifi-registry-core/nifi-registry-framework/src/main/java/org/apache/nifi/registry/service/MetadataService.java
> >
> > On Tue, Mar 5, 2019 at 11:08 AM Bryan Rosander <bryanrosander@gmail.com>
> > wrote:
> > >
> > > Hi all,
> > >
> > > We're trying to implement a feature branch workflow with NiFi Registry.
> > > The basic idea is that you'd be able to have one or more branches off of
> > > master and then the capability to merge changes back in when they're
> > ready.
> > >
> > > Our flow has many versioned process groups that are nested several layers
> > > deep.
> > >
> > > There are a couple of approaches that we're thinking about:
> > >
> > > NiPyApi [1] (hacky but should work without registry changes):
> > > 1. Keep track of versions when branch was created
> > > 2. Use NiPyApi to compile a list of changed flows since the branch
> > > 3. Apply changes to master registry from the bottom process group(s) up,
> > > updating version numbers of child pgs as we go, writing out to some sort
> > of
> > > patch file
> > > 4. Verify patch file looks right, use NiPyApi to apply it (could do peer
> > > review as part of this)
> > >
> > > Git-Based:
> > > Create a git-backed MetadataService that can coexist with
> > > the GitFlowPersistenceProvider (share a repository, not mess with each
> > > others' metadata)
> > >
> > > Git already supports a branching workflow, this would allow us to branch
> > > off master in git and spin up a registry pointed at the branch.  We could
> > > also use pull requests to update the master registry (currently that
> > would
> > > require bouncing the maste but that could be improved as well)
> > >
> > > Further out we could also store pointers to binaries in git and then
> > have a
> > > blob store for accessing them, (S3, nfs, sftp, whatever) similarly to how
> > > git-lfs [2] works so that it would be possible to use git workflows for
> > > managing branching of extensions as well.
> > >
> > > Note: I'm not suggesting forcing git on all registry users, just having
> > it
> > > as a configuration option :)
> > >
> > > [1] https://github.com/Chaffelson/nipyapi
> > > [2] https://git-lfs.github.com/
> > >
> > > Thanks,
> > > Bryan
> >

Mime
View raw message