nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Burgess <mattyb...@apache.org>
Subject Re: Dynamic Mapping
Date Thu, 04 Oct 2018 20:22:38 GMT
Another thing I'd mention about JOLT is that it works by matching
"nodes" as it walks the "tree". That means in certain cases you can
include fields in the spec that may not match a node in one input, but
would match in another. So in the fullName case, you might be able to
write one spec that works for both examples, by matching the
"middleName" node along with the firstName and lastName nodes, and
building the fullName node using a (possible) match on middleName.
When middleName doesn't exist, it shouldn't contribute anything to the
fullName field.

Not to say that this will work for your case, but it is an interesting
feature of JOLT where you don't need exact schemas, because you're
writing the rules for each match. NiFi's Record API has some similar
interesting features, where you can often have an "overall" schema
that includes all possible fields, and if they don't exist in the
input/output, they will be ignored. Of course, both approaches have
their own caveats, but wanted to bring these up in case it helps you
for your flow.

Regards,
Matt

P.S.  NiFi 1.8.0 will have JoltTransformRecord, so you'll be able to
use JOLT on any input we have a Reader for, not just JSON.
On Thu, Oct 4, 2018 at 4:11 PM Bryan Bende <bbende@gmail.com> wrote:
>
> How are you currently defining schemas and transforms? or are you free
> to choose depending what works best for building the data flow?
>
> If you are willing to use JOLT to define the transform then you have a
> few options...
>
> 1) As Joe mentioned, you can send the GUID in a header which becomes a
> flow file attribute and use RouteOnAttribute to route to the
> appropriate JoltTransformJSON processor. This requires the processor
> is setup ahead of time for each GUID.
>
> 2) As Carlos mentioned, you can put all the JOLT specs somewhere like
> a relational database, or maybe the distributed map cache, and then
> build your flow to take the GUID from the flow file attributes and
> retrieve the JOLT spec into a flow file attribute, and then have a
> single JoltTransformJson that sets the Jolt Spec to a dynamic value
> from the flow file attribute.
>
> 3) If you don't want to store the JOLT specs somewhere, then you could
> actually put the spec in a header as part of the event creation that
> is sent over, the header then becomes a flow file attribute and would
> be referenced the same way as in #2. This may not be that great if the
> JOLT specs are very large.
>
> The record processors would work well if you had simple variations of
> the same schema, like System A has "first name, last name" and System
> B only wants "first name", but once you have more complex logic like
> your example where two fields have to be read as one field, etc, then
> it becomes hard to make the flow dynamic. You could apply that logic
> with UpdateRecord most likely, but you would need a pre-defined
> UpdateRecord for each transform, and it would likely not be as
> powerful as a JOLT transform.
>
>
> On Thu, Oct 4, 2018 at 3:14 PM Ryan H <ryan.howell.development@gmail.com> wrote:
> >
> > Joe/Carlos,
> >
> > Thanks for the replies. Here is a little more information that hopefully clarifies
the problem statement. We have System-A and System-B. Each time we bring on a new User, they
get a new Organization (this will be the unique id) in both System-A and System-B. The new
User will go in and set up their objects for their Organization (the schemas unique to the
user). Say the first schema they set up is for a Customer object. They start with a base schema
for a Customer that is predefined.
> >
> > The Base Schema and Mappings
> > System-A Base Schema
> > First_Name
> > Last_Name
> >
> > System-B Schema
> > Full_Name
> >
> > Base Mapping
> > First_Name + Last_Name -> Full_Name
> >
> > ----
> >
> > First User's Schema and mapping for their Organization:
> > System-A Organization 1
> > First_Name
> > Last_Name
> >
> > System-B Schema
> > Full_Name
> >
> > Organization 1 Mapping
> > First_Name + Last_Name -> Full_Name
> >
> > First user makes no changes to the schema or mapping.
> >
> > ----
> >
> > Second user comes onboard. They don't like the base schema, so they make theirs
custom for their Organization.
> >
> > They do:
> > System-A Organization 2
> > First_Name
> > Middle_Name
> > Last_Name
> >
> > System-B Schema
> > Full_Name
> >
> > Organization 1 Mapping
> > First_Name + Middle_Name + Last_Name -> Full_Name
> >
> > I should mention that System-A is the only modifiable schema; System-B is fixed.
The only thing that can change is the System-A schema and its mapping to System-B and will
be unique to each Organization.
> >
> > Hope this clarifies a bit more.
> >
> > Cheers,
> >
> > Ryan H
> >
> >
> > On Thu, Oct 4, 2018 at 2:39 PM Joe Witt <joe.witt@gmail.com> wrote:
> >>
> >> Ryan
> >>
> >> I am not entirely sure I fully understand the scenario so read the
> >> following with a bit of caution.
> >>
> >> But the gist I think i'm reading is that system A generates data
> >> against a given schema referenceable with some guid say 'GUID1'.  For
> >> every GUID1 from System A there is a similar/but possibly different
> >> schema in System B referenceable again by some guid say GUID1_B.
> >>
> >> Are those base schemas you referenced say GUID1 and GUID1_B compatible
> >> in that all fields of import in GUID1 schema will be in GUID1_B
> >> schema?  As-in can you treat GUID1_B schemas as a superset of GUID1
> >> schemas?
> >>
> >> In any event, it is very possible that you can achieve this using the
> >> Record processors and Record oriented controller services.
> >>
> >> You can 'read' data from system A in NiFi using record readers that
> >> reference schemas from systemA and 'write' data using system B schemas
> >> in NiFi.
> >>
> >> The record oriented processors combined with their pluggable record
> >> readers and writers allows this to happen. The schema for the reader
> >> and writer can be different and the processors will map things over by
> >> field names/types.  That leaves the problem of 'how to access' the
> >> schema information so NiFi knows what to do during read/write phases.
> >> You can either ensure all these schemas are in the nifi schema
> >> registry and can be looked up by some well established name and naming
> >> scheme.  You could also do it via a SQL lookup if you have the schemas
> >> in some database (similar to what Carlos might be suggesting but
> >> without Jolt).  Or some other way..
> >>
> >> Now, if your mapping logic from System A schema to System B schema is
> >> more complex then you'll want something like JOLT most likely to help
> >> manage those transforms.
> >>
> >> You can also do some routing on the requests from System A to
> >> specialized JOLT mappers for each case of converting to System B
> >> schemas.  That will end up in a lot of config but it is likely you can
> >> paramaterize this well using versioned flows and expression langauge
> >> statements in the variable registry.
> >>
> >> Thanks
> >> Joe
> >> On Thu, Oct 4, 2018 at 1:35 PM Ryan H <ryan.howell.development@gmail.com>
wrote:
> >> >
> >> > Hi All,
> >> >
> >> > I have been working on an integration between two systems with NiFi in
the middle as the integration point. Basically when an event happens on System-A, such as
a record update, those changes need to be propagated to System-B. For the first use case,
I have set up a data flow that listens for incoming requests from System-A, performs a mapping,
the sends the mapped data to System-B.
> >> >
> >> > Generalized Flow for "Create_Event" (dumbed down significantly):
> >> > System-A "Create_Event" -> HandleHTTPRequest -> JoltTransformJSON
-> InvokeHTTP -> System-B "Create_Event"
> >> >
> >> > This works great for the first case with a predefined mapping in JoltTransformJSON.
Now I want to generalize it so that the same data flow can be used for all Create_Event's
on System-A.
> >> >
> >> > Here is where the issue comes in. There is a base schema for System-A that
has a base mapping to the base schema in System-B. Users of the System have the ability to
"extend" the base schema to add/remove fields and modify the base mapping. So each time the
Create_Event happens, the mapping that is used should be the unique mapping spec associated
to that user (call it a GUID that comes along with the request).
> >> >
> >> > The data flow is the exact same for all Create_Events, except for the mapping,
which will be unique to the user.
> >> >
> >> > Does anyone know of a way to load up a different mapping to be used on
a per-request basis? I used JoltTransformJSON just as a proof of concept, so it does not need
to be a Jolt spec and can be modified to meet the needs of whatever would work for this.
> >> >
> >> > I started to look into schema registry, but kind of got lost a bit and
wasn't sure if it could be applied to this situation.
> >> >
> >> > Any Help is, as always, greatly appreciated.
> >> >
> >> >
> >> > Cheers,
> >> >
> >> > Ryan H.

Mime
View raw message