nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James McMahon <jsmcmah...@gmail.com>
Subject Re: Creating an attribute
Date Sat, 18 Aug 2018 11:43:41 GMT
I do have a follow-up question. In my example I have oversimplified the
structure. In my production space I have two complicating factors: the
number of fields can vary, and only three fields are mandatory and so must
be there. And the fields order can vary: the messages posted to the queue
that we consume from have no requirement to enforce the order of the
fields. All I know is that I will have my three guaranteed fields. Can
UpdateRecord still be used, referencing the three fields explicitly,
telling it to put my new field(s) after one of those where ever it may be
in the object, and indicating it should then include all other keys/values
in the object?

On Fri, Aug 17, 2018 at 4:24 PM, Matt Burgess <mattyb149@apache.org> wrote:

> Jim,
>
> You can use UpdateRecord for this, your input schema would have "last"
> and "first" in it (and I think you can have an optional "myKey" field
> so you can use the same schema for the writer), and the output schema
> would have all three fields in it. Then you'd set the Replacement
> Value Strategy to "Literal Value" and add a user-defined property in
> UpdateRecord called "/myKey" set to "${myKey}". This will take the
> value from the attribute myKey and put it at the root of each record
> in a field called myKey.  Since this is JSON, you could do the same
> with JoltTransformJSON, with a Default spec setting "myKey":
> "${myKey}". Not sure which is faster in this case, since there appears
> to be a single record.
>
> This also works if there are multiple records in the flow file, as
> long as the myKey field is to have the same value for all records
> (since there is only one myKey attribute value for the whole flow
> file).  If there are multiple records and they each need, you have a
> "lookup" use case on your hands, where you'd want to match some value
> against some lookup service, and it would fill in that field from the
> value supplied by the lookup service (you'd use LookupService for
> this). Or if all else fails, there is the Split pattern if you truly
> do want/need to process one JSON object at a time.
>
> Regards,
> Matt
>
> On Fri, Aug 17, 2018 at 4:06 PM James McMahon <jsmcmahon3@gmail.com>
> wrote:
> >
> > I do appreciate your point, Tim and Lee. What if I do this instead:
> append select attributes to my data payload. Would that minimize the impact
> on RAM? Can I do that?
> >
> > More specifically, my data payload is a string representation of a JSON
> object, like so:
> > {"last":"manson","first":"marilyn"}
> > and I have an attribute named myKey that contains the value "123abc"
> >
> > Is there a processor that allows me to wind up with this string
> representation of JSON:
> > {"last":"manson","first":"marilyn", "myKey":"123abc"}
> >
> > If I could do that, I could avoid loading the entire data payload into
> an attribute, and manipulate them in a python script called by
> ExecuteScript. I know how to do that, I don't know how to do the above with
> native processors.
> > Thanks in advance for your help.
> >
> > On Fri, Aug 17, 2018 at 2:02 PM, Lee Laim <lee.laim@gmail.com> wrote:
> >>
> >> Jim,
> >> I think the ExtractText processor with a large enough MaxCaptureGroup
> length (default :1024) will do that.      Though, I share Tim’s concerns
> when you scale up
> >> Thanks,
> >> Lee
> >>
> >>
> >> > On Aug 17, 2018, at 11:52 AM, Timothy Tschampel <tim.tschampel@
> vivacehealthsolutions.com> wrote:
> >> >
> >> >
> >> > This may not be applicable to your use case depending on message
> volume / # of attributes; but I would avoid putting payloads into
> attributes for scalability reasons (especially RAM usage).
> >> >
> >> >
> >> >> On Aug 17, 2018, at 10:47 AM, James McMahon <jsmcmahon3@gmail.com>
> wrote:
> >> >>
> >> >> I have flowfiles with data payloads that represent small strings of
> text (messages consumed from AMQP queues). I want to create an attribute
> that holds the entire payload for downstream use. How can I capture the
> entire data payload of a flowfile in a new attribute on the flowfile? Thank
> you in advance for your help. -Jim
> >> >
> >
> >
>

Mime
View raw message